Performing Statistical Methods on Linked Data

Benjamin Zapilko, Brigitte Mathiak

Abstract


In recent years, many government agencies have published statistical information as Linked Open Data (e.g. Eurostat, data.gov.uk). Yet, while there are a number of visualization tools, researchers use data for scientific statistical analysis to answer their research questions. Currently, they have to download the statistical data in a table-based format, in order to use their statistics software, unfortunately losing all the benefits Linked Data provides to them like interlinking with other data sets. In this paper, we present an approach specifically designed to help researchers to perform statistical analysis on Linked Data. By combining distributed sources with SPARQL, we are able to apply simple statistical calculations, such as linear regression and present the results to the user. Results of testing these calculations with heterogeneous data sources expose a wide range of typical issues on data integration which have to be aware of when working with heterogeneous statistical data.

Full Text:

PDF