Extracting Description Set Profiles from RDF Datasets using Metadata Instances and SPARQL Queries

Tsunagu Honma, Kei Tanaka, Mitsuharu Nagamori, Shigeo Sugimoto

Abstract


A variety of communities create and publish metadata as Linked Open Data (LOD). Users of those datasets find and use them for their own purpose and may combine the datasets to add value. Each LOD dataset uses various vocabularies, structures and constraints for describing resources. In order to improve the usability of LOD datasets, it is very important for metadata designers to enhance the interoperability of their own metadata with that of other datasets. In order to create new interoperable metadata, metadata schema designers have to understand the Application Profiles of the existing LOD datasets. Dublin Core Description Set Profiles (DSP) are a component of DCMI Application Profiles. A DSP describes the structures and constraints of metadata in an application (e.g. resource classes, properties cardinality, value scheme). Metadata schema registries which collect and provide metadata schemas have a large potential for helping metadata schema designers find, compare and adopt existing schemas. However, most LOD datasets are not published with their DSPs. As a result, metadata schema designers have to look at each dataset and guess the DSPs. This paper proposes a method to extract the structural constraints of metadata records automatically from metadata instances using existing metadata schema. The goal of this study is to reduce the cost of metadata schema extraction and to increase the number of metadata schemas registered in metadata schema registries. We have experimentally extracted constraints from LOD datasets using SPARQL. In order to evaluate our approach, we applied our approach to 10 datasets in the DataHub. By comparing the structural constraints which were extracted using our approach with a manual approach, we found that our approach was able to extract more constraints.

Full Text:

PDF (Paper)