Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations-Reference-Cited by-同舟云学术

Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations

Published:2020-11-23 Issue:1 Volume:25 Page:114-146
ISSN:1094-4281
Container-title:Organizational Research Methods
language:en
Short-container-title:Organizational Research Methods

Author:

Hickman Louis¹^ORCID,Thapa Stuti¹,Tay Louis¹,Cao Mengyang²,Srinivasan Padmini³

Affiliation:

1. Purdue University College of Health and Human Sciences, West Lafayette, IN, USA

2. Independent

3. University of Iowa, Computer Science, Iowa City, IA, USA

Abstract

Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often overlooked, decisions made during text preprocessing affect whether the content and/or style of language are captured, the statistical power of subsequent analyses, and the validity of insights derived from text mining. Past methodological articles have described the general process of obtaining and analyzing text data, but recommendations for preprocessing text data were inconsistent. Furthermore, primary studies use and report different preprocessing techniques. To address this, we conduct two complementary reviews of computational linguistics and organizational text mining research to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted (i.e., open or closed vocabulary), the research question under investigation, and the data set’s characteristics (i.e., corpus size and average document length). Notably, deviations from these recommendations will be appropriate and, at times, necessary due to the unique characteristics of one’s text data. We also provide recommendations for reporting text mining to promote transparency and reproducibility.

Publisher

SAGE Publications

Subject

Management of Technology and Innovation,Strategy and Management,General Decision Sciences

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094428120971683

Reference193 articles.

1. Management Fashion: Lifecycles, Triggers, and Collective Learning Processes

2. Attentional homogeneity in industries: the effect of discretion

3. Using Bisect K-Means Clustering Technique in the Analysis of Arabic Documents

4. First Decade of Organizational Research Methods

5. *Akinola M., Martin A. E., Phillips K. W. (2018). To delegate or not to delegate: Gender differences in affective associations and behavioral responses to delegation. Academy of Management Journal, 61(4), 1467–1491. https://doi.org/10.5465/amj.2016.0662

Cited by 129 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Path release among practices in the process of path constitution: How the MP3-path appeared in the field of recorded music;Research Policy;2024-10

2. Methods of implicit aspect detection in Russian publicism sentences;Modeling and Analysis of Information Systems;2024-09-13

3. News and Load: A Quantitative Exploration of Natural Language Processing Applications for Forecasting Day-Ahead Electricity System Demand;IEEE Transactions on Power Systems;2024-09

4. Quantifying the scientist–practitioner gap: How do small business owners react to our academic articles?;Industrial and Organizational Psychology;2024-08-27

5. Exploration on Advanced Intelligent Algorithms of Artificial Intelligence for Verb Recognition in Machine Translation;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-08-08