Leverage Natural Language Processing (NLP) to improve the discoverability of academic resources

Charlene Chou; Shravan Khunti; Harshit Bhargava

doi:10.23106/dcmi.952586098

Project Report

Leverage Natural Language Processing (NLP) to improve the discoverability of academic resources

Charlene Chou ,Shravan Khunti ,Harshit Bhargava

Article PDF Read Online

DOI: 10.23106/dcmi.952586098

Abstract

This interdisciplinary project is a collaboration among library metadata librarians, data scientists, digital library technologists, university IT, and the university press. Its goal is to improve the discoverability of academic resources by enhancing metadata through Natural Language Processing (NLP) and embedding-based semantic search, addressing the limitations of traditional keyword-based retrieval. To support this pilot, a library NLP system architecture has been designed, including the development of a vector database to enable semantic search within discovery platforms

Author information

Charlene Chou

Division of Libraries, New York University,US

ORCID Google Scholar Semantic Scholar

Shravan Khunti

Center for Data Science, New York University,US

ORCID Google Scholar Semantic Scholar

Harshit Bhargava

Center for Data Science, New York University,US

ORCID Google Scholar Semantic Scholar

Cite this article

Chou, C., Khunti, S., & Bhargava, H. (2025). Leverage Natural Language Processing (NLP) to improve the discoverability of academic resources. Proceedings of the International Conference on Dublin Core and Metadata Applications, 2025. https://doi.org/10.23106/dcmi.952586098

DOI: 10.23106/dcmi.952586098
Published: 2025-12-24

Issue

DCMI 2025 Conference Proceedings

Location:: University of Barcelona, Barcelona, Spain
Dates:: October 22-25, 2025

Metadata and citations of this article is published under the Creative Commons Zero Universal Public Domain Dedication (CC0), allowing unrestricted reuse. Anyone can freely use the metadata from DCPapers articles for any purpose without limitations.

This article full-text is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license allows use, sharing, adaptation, distribution, and reproduction in any medium or format, provided that appropriate credit is given to the original author(s) and the source is cited.