Analysis of user-supplied metadata in a health sciences institutional repository

Joelen Pastva

Abstract


Launched in October, 2015 by the Galter Health Sciences Library, the DigitalHub repository is designed to capture and preserve the scholarly outputs of Northwestern Medicine. A major motivation to deposit in the repository is the possibility of improved citations and discovery of resources, however one of the largest barriers hampering discovery is a lack of descriptive metadata. Because DigitalHub was designed for ease of use, very minimal metadata is required in order to successfully deposit a resource. However, many optional descriptive metadata fields are also made available to encourage the consistent and detailed entry of descriptive information. The library was curious to evaluate how users were approaching available metadata fields and accompanying instructions prior to the library's performance of metadata enhancement operations. In order to evaluate user-supplied metadata, an export was made of all of the metadata in DigitalHub for a 2.5 year period. Records previously enhanced by librarians, or records initially deposited by library staff were excluded from consideration. The metadata was then evaluated for completeness, choice of dropdown terms for resource type, inclusion of collaborators, use of controlled vocabulary fields, and any areas that indicated a clear misunderstanding of the intended use of the metadata field. This poster presents the preliminary findings of this analysis of user-supplied metadata. It is hoped that the findings of this analysis will help guide future system and interface design decisions, cleanup activities, and library instruction activities. Ultimately the goal is to make the interface as usable and effective as possible to encourage depositors to supply an optimal amount of descriptive metadata upfront, and to continue using the repository in the future. These results should be of interest to repository managers that rely on users to supply initial descriptive metadata, especially for health sciences disciplines.