Full Paper

Assessing the Effectiveness of LLMs (Large Language Models) for Extracting Topics and Themes in Survey Responses

Ying-Hsang Liu ORCID,Xin Yang ORCID,Junzhi Jia ORCID

DOI: 10.23106/dcmi.952535736

Abstract

Artificial intelligence (AI) has the potential to automate metadata tasks, such as identifying key topics and analyzing recurring themes in text. Topic extraction focuses on recognizing dominant subjects, whereas theme extraction examines patterns of meaning within the text. This study evaluated DeepSeek R1 (8b), DeepSeek R1 (14b), and Gemma3 (12b) on extracting topics and themes from 50 qualitative survey comments. Using standard information retrieval methods and metrics, we found that Gemma3 (12b) consistently outperformed the DeepSeek models. Topic detection was handled with reasonable effectiveness (both DeepSeek R1 (8b) and Gemma3 (12b) global F1 0.31). However, theme detection was significantly more challenging, particularly for DeepSeek models (global F1s 0.02, 0.08), with Gemma3 (12b) achieving F1 0.26. Significant document-level variability was also observed. Standard information retrieval (IR) metrics can be applied to assess AI performance in metadata tasks, but achieving accuracy comparable to human experts in abstract thematic analysis remains a significant challenge. Developing AI systems that can better capture the subtleties of abstract meaning needs human oversight since these capabilities are critical for supporting complex analytical tasks.

Author information

Ying-Hsang Liu

Professorship of Predictive Analytics, Chemnitz University of Technology,DE

Xin Yang

School of Information Resources Management, Renmin University of China,CN

Junzhi Jia

School of Information Resources Management, Renmin University of China,CN

Cite this article

Liu, Y.-H., Yang, X., & Jia, J. (2025). Assessing the Effectiveness of LLMs (Large Language Models) for Extracting Topics and Themes in Survey Responses. Proceedings of the International Conference on Dublin Core and Metadata Applications, 2025. https://doi.org/10.23106/dcmi.952535736
Published

Issue

DCMI 2025 Conference Proceedings
Location:
University of Barcelona, Barcelona, Spain
Dates:
October 22-25, 2025
CC-0 Logo Metadata and citations of this article is published under the Creative Commons Zero Universal Public Domain Dedication (CC0), allowing unrestricted reuse. Anyone can freely use the metadata from DCPapers articles for any purpose without limitations.
CC-BY Logo This article full-text is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license allows use, sharing, adaptation, distribution, and reproduction in any medium or format, provided that appropriate credit is given to the original author(s) and the source is cited.