Affiliation:
1. National Taiwan University
Abstract
Abstract
The past few decades have seen the rapid development of topic modeling. So far, research has been more concerned with determining the ideal number of topics or meaningful topic clustering words than with applying topic modeling techniques to evaluate linguistic theories. This study proposes the Structural Topic Model (STM)-led framework to facilitate the interpretation of topic modeling results and standardize text analysis. STM encompasses various model training mechanisms, thereby requiring systematic designs to properly combine language studies. “Structural” in STM refers to the inclusion of metadata structure. Unlike the corpus-based keyness approach, STM can capture contextual cues and meta-information for the interpretation of topical results. Besides, STM can make cross-corpora comparisons via topical contrast, a challenging task for corpus-driven related models such as the Biterm Topic Model (BTM). Stylistic variations in song lyrics are taken as an illustration to show how to use the suggested framework to delve into the linguistic theory proposed by Pennebaker (2013). The topical model and iterable model in the proposed paradigm can clarify how pronouns affect style distinction. We believe the proposed STM-led framework can shed light on text analysis by conducting a reproducible cross-corpora comparison on short texts.
Publisher
John Benjamins Publishing Company
Subject
Linguistics and Language,Language and Linguistics