The interpretation of topic models for scholarly analysis: An evaluation and critique of current practice

Author:

Gillings Mathew1ORCID,Hardie Andrew2ORCID

Affiliation:

1. Institute for English Business Communication, Vienna University of Economics and Business , Vienna 1020, Austria

2. Department of Linguistics and English Language, Lancaster University , Lancaster LA1 4YL, UK

Abstract

Abstract Topic modelling is a method of statistical data mining of a corpus of documents, popular in the digital humanities and, increasingly, in social sciences. A critical methodological issue is how ‘topics’ (groups of co-selected word types) can be interpreted in analytically meaningful terms. In the current literature, this is typically done by ‘eyeballing’; that is, cursory and largely unsystematic examination of the ‘top’ words in each algorithmically identified word group. We critically evaluate this approach in a dual analysis, comparing the ‘eyeballing’ approach with an alternative using sample close reading across the corpus. We used MALLET to extract two topic models from a test corpus: one with stopwords included, another with stopwords excluded. We then used the aforementioned methods to assign labels to these topics. The results suggest that a close-reading approach is more effective not only in level of detail but even in terms of accuracy. In particular, we found that: assigning labels via eyeballing yields incomplete or incorrect topic labels; removing stopwords drastically affects the analysis outcome; topic labelling and interpretation depend considerably on the analysts’ specialist knowledge; and differences of perspective or construal are unlikely to be captured through a topic model. We conclude that an interpretive paradigm founded in close reading may make topic modelling more appealing to humanities researchers.

Funder

UK Economic and Social Research Council

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Linguistics and Language,Language and Linguistics,Information Systems

Reference30 articles.

1. Topic modeling method for analyzing social actor discourses on climate change, energy and food security;Benites-Lazaro;Energy Research & Social Science,2018

2. Probabilistic topic models: surveying a suite of algorithms that offer a solution to managing large document archives;Blei;Communications of the ACM,2012

3. Latent Dirichlet allocation;Blei;Journal of Machine Learning Research,2003

4. Topic modeling: a basic introduction;Brett;Journal of Digital Humanities,2012

5. The utility of topic modelling for discourse studies: a critical evaluation;Brookes;Discourse Studies,2019

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Taking the Road Less Travelled: How Corpus‐Assisted Discourse Studies Can Enrich Qualitative Explorations of Large Textual Datasets;British Journal of Management;2024-03

2. Topic modelling literary interviews from The Paris Review;Digital Scholarship in the Humanities;2024-01-10

3. Automated Topic Exploration in a Cultural Heritage Corpus;Communications in Computer and Information Science;2024

4. Corpus-Assisted Discourse Studies;2023-03-20

5. Artificial Intelligence in Historical Research: Potential and Limits of Effectiveness;V International Scientific Conference «MIP-V-2023: Modernization, Innovations, Progress»;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3