Performing Statistical Methods on Linked Data

How to Cite

Zapilko, B., & Mathiak, B. (2011). Performing Statistical Methods on Linked Data. International Conference on Dublin Core and Metadata Applications, ( ), 116–125. Retrieved from


In recent years, many government agencies have published statistical information as Linked Open Data (e.g. Eurostat, Yet, while there are a number of visualization tools, researchers use data for scientific statistical analysis to answer their research questions. Currently, they have to download the statistical data in a table-based format, in order to use their statistics software, unfortunately losing all the benefits Linked Data provides to them like interlinking with other data sets. In this paper, we present an approach specifically designed to help researchers to perform statistical analysis on Linked Data. By combining distributed sources with SPARQL, we are able to apply simple statistical calculations, such as linear regression and present the results to the user. Results of testing these calculations with heterogeneous data sources expose a wide range of typical issues on data integration which have to be aware of when working with heterogeneous statistical data.
The copyright for articles is retained by the author(s), with first publication rights granted to DCMI for publication in the electronic and print proceedings. By virtue of their appearance in this open access publication, articles are free to be used with proper attribution for educational and other non-commercial purposes. Other uses may require the permission of the author(s).