A hierarchical topic analysis tool to facilitate digital humanities research

Author:

Chen Chih-MingORCID,Ho Szu-YuORCID,Chang ChungORCID

Abstract

PurposeThis study aims to develop a hierarchical topic analysis tool (HTAT) based on hierarchical Latent Dirichelet allocation (hLDA) to support digital humanities research that is associated with the need of topic exploration on the Digital Humanities Platform for Mr. Lo Chia-Lun’s Writings (DHP-LCLW). HTAT can assist humanities scholars on distant reading with analysis of hierarchical text topics, through classifying time-stamped texts into multiple historical eras, conducting hierarchical topic modeling (HTM) according to the texts from different eras and presenting through visualization. The comparative network diagram is another function provided to assist humanities scholars in comparing the difference in the topics they wish to explore and to track how the concept of a topic changes over time from a particular perspective. In addition, HTAT can also provide humanities scholars with the feature to view source texts, thus having high potential to be applied in promoting the effectiveness of topic exploration due to simultaneously integrating both the topic exploration functions of distant reading and close reading.Design/methodology/approachThis study adopts a counterbalanced experimental design to examine whether there is significant differences in the effectiveness of topic inquiry, the number of relevant topics inquired and the time spent on them when research participants were alternately conducting text exploration using DHP-LCLW with HTAT or DHP-LCLW with Single-layer Topic Analysis Tool (SLTAT). A technology acceptance questionnaire and semi-structured interviews were also conducted to understand the research participants' perception and feelings toward using the two different tools to assist topic inquiry.FindingsThe experimental results show that DHP-LCLW with HTAT could better assist the research participants, in comparison with DHP-LCLW with SLTAT, to grasp the topic context of the texts from two particular perspectives assigned by this study within a short period. In addition, the results of the interviews revealed that DHP-LCLW with HTAT, in comparison with SLTAT, was able to provide a topic terms that better met research participnats' expectations and needs, and effectively guided them to the corresponding texts for close reading. In the analysis of technology acceptance and interview data, it can be found that the research participants have a high and positive tendency toward using DHP-LCLW with HTAT to assist topic inquiry.Research limitations/implicationsThe Jieba Chinese word segmentation system was used in the Mr. Lo Chia-Lun’s Writings Database in this study, to perform word segmentation on Mr. Lo Chia-Lun’s writing texts for topic modeling based on hLDA. Since Jieba word segmentation system is a lexicon based word segmentation system, it cannot identify new words that have still not been collected in the lexicon well. In this case, the correctness of word segmentation on the target texts will affect the results of hLDA topic modeling, and the effectiveness of HTAT in assisting humanities scholars for topic inquiry.Practical implicationsAn HTAT was developed to support digital humanities research in this study. With HTAT, DHP-LCLW provides hmanities scholars with topic clues from different hierarchical perspectives for textual exploration, and with temporal and comparative network diagrams to assist humanities scholars in tracking the evolution of the topics of specific perspectives over time, to gain a more comprehensive understanding of the overall context of the texts.Originality/valueIn recent years, topic analysis technology that can automatically extract key topic information from a large amount of texts has been developed rapidly, but the topics generated from traditional topic analysis models like LDA (Latent Dirichelet allocation) make it difficult for users to understand the differences in the topics of texts with different hierarchical levels. Thus, this study proposes HTAT which uses hLDA to build a hierarchical topic tree with a tree-like structure without the need to define the number of topics in advance, enabling humanities scholars to quickly grasp the concept of textual topics and use different hierarchical perspectives for further textual exploration. At the same time, it also provides a combination function of temporal division and comparative network diagram to assist humanities scholars in exploring topics and their changes in different eras, which helps them discover more useful research clues or findings.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3