Abstract

In recent years, many government agencies have published statistical information as Linked Open Data (e.g. Eurostat, data.gov.uk). Yet, while there are a number of visualization tools, researchers use data for scientific statistical analysis to answer their research questions. Currently, they have to download the statistical data in a table-based format, in order to use their statistics software, unfortunately losing all the benefits Linked Data provides to them like interlinking with other data sets. In this paper, we present an approach specifically designed to help researchers to perform statistical analysis on Linked Data. By combining distributed sources with SPARQL, we are able to apply simple statistical calculations, such as linear regression and present the results to the user. Results of testing these calculations with heterogeneous data sources expose a wide range of typical issues on data integration which have to be aware of when working with heterogeneous statistical data.

Author information

Benjamin Zapilko
GESIS ? Leibniz Institute for the Social Sciences, null
Brigitte Mathiak
GESIS ? Leibniz Institute for the Social Sciences, null

Cite this article

Zapilko, B., & Mathiak, B. (2011). Performing Statistical Methods on Linked Data. International Conference on Dublin Core and Metadata Applications, 2011. https://doi.org/10.23106/dcmi.952135699

DOI : 10.23106/dcmi.952135699

CC-0 Logo Metadata and citations of this article is published under the Creative Commons Zero Universal Public Domain Dedication (CC0), allowing unrestricted reuse. Anyone can freely use the metadata from DCPapers articles for any purpose without limitations.
CC-BY Logo This article full-text is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license allows use, sharing, adaptation, distribution, and reproduction in any medium or format, provided that appropriate credit is given to the original author(s) and the source is cited.