A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts-Reference-Cited by-同舟云学术

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Published:2024-04-04 Issue:4 Volume:56 Page:2782-2803
ISSN:1554-3528
Container-title:Behavior Research Methods
language:en
Short-container-title:Behav Res

Author:

Macanovic Ana^ORCID,Przepiorka Wojtek

Abstract

AbstractShort texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets, while providing best-practice recommendations.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.3758/s13428-024-02381-9.pdf

Reference140 articles.

1. Abdurahman, S., Atari, M., Karimi-Malekabadi, F., Xue, M. J., Trager, J., Park, P. S., Golazizian, P., Omrani, A., & Dehghani, M. (2023). Perils and Opportunities in Using Large Language Models in Psychological Research. OSF preprint. https://doi.org/10.31219/osf.io/tg79n

2. Aggarwal, C. C. (2018). Machine learning for text. Springer International Publishing. https://doi.org/10.1007/978-3-319-73531-3

3. Aggarwal, C. C., & Zhai, C. (2012). A Survey of Text Classification Algorithms. In C. C. Aggarwal & C. Zhai (Eds.), Mining Text Data (pp. 163–222). Springer US. https://doi.org/10.1007/978-1-4614-3223-4

4. Amador Diaz Lopez, J. C., Collignon-Delmar, S., Benoit, K., & Matsuo, A. (2017). Predicting the brexit vote by tracking and classifying public opinion using twitter data. Statistics, Politics and Policy, 8(1). https://doi.org/10.1515/spp-2017-0006

5. Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (2021). Automated Text Classification of News Articles: A Practical Guide. Political Analysis, 29(1), 19–42. https://doi.org/10.1017/pan.2020.8

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Methods for measuring career readiness of high school students: based on multidimensional item response theory and text mining;Humanities and Social Sciences Communications;2024-07-16