Full Paper

Metadata Enrichment with Named Entity Recognition using GPT-4

Ashwin Nair ORCID,Ee Min Hoon ORCID,Robin Dresel ORCID

DOI: 10.23106/dcmi.952445840

Abstract

To enhance the user experience and resource discoverability of Infopedia, the Singapore encyclopedia, the National Library Board of Singapore (NLB) uses Generative Pre-trained Transformer 4 (GPT-4) for Named Entity Recognition (NER), aiming to automate metadata enrichment of its digital encyclopedia articles. This initiative leverages GPT-4's capabilities in accurately identifying and incorporating relevant Singaporean entities before integrating them into the NLB's Knowledge Graph, improving recommendations of related resources. An evaluation on a subset of 100 articles demonstrates a precision score of 0.975, indicating high entity detection with minimal inaccuracies. The team acknowledges challenges related to GPT-4’s black-box nature and the potential for non-reproducibility. This effort illustrates the potential of generative AI to streamline metadata enrichment processes, offering a promising avenue for enhancing metadata of digital libraries.

Author information

Ashwin Nair

National Library Board,SG

Ee Min Hoon

National Library Board,SG

Robin Dresel

National Library Board,SG

Cite this article

Nair, A., Min Hoon, E., & Dresel, R. (2024). Metadata Enrichment with Named Entity Recognition using GPT-4. Proceedings of the International Conference on Dublin Core and Metadata Applications, 2024. https://doi.org/10.23106/dcmi.952445840
Published

Issue

DCMI-2024 Toronto, Canada Proceedings
Location:
University of Toronto, Toronto, Ontario, Canada
Dates:
October 20-23, 2024
CC-0 Logo Metadata and citations of this article is published under the Creative Commons Zero Universal Public Domain Dedication (CC0), allowing unrestricted reuse. Anyone can freely use the metadata from DCPapers articles for any purpose without limitations.
CC-BY Logo This article full-text is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license allows use, sharing, adaptation, distribution, and reproduction in any medium or format, provided that appropriate credit is given to the original author(s) and the source is cited.