Abstract
Computer scientists in natural language processing (NLP) have focused on the lexical level of language: word counts, ratios, distance, and context, and this attention to the lexical level of language is well suited to semantic tasks as well as syntactic analyses. Corpus linguists on the other hand have had a broader focus, also accounting for the lexicogrammatical level of language, and thus their approach is well-suited to pragmatic tasks. DocuScope, with its linguistic taxonomy at the lexicogrammatical level, is thus a unique and complementary tool for the data-driven analysis of large collections of text, addressing the stance and style choices pervasive in linguistic behavior. This chapter looks at how DocuScope’s taxonomy has informed a range of problems in public policy at the RAND Corporation. One section of the chapter examines how the DocuScope taxonomy has been used as a statistical tool to find patterns in text corpora, scaling up human qualitative analysis into a mixed methods text analysis approach, for example analyzing open text responses in a large survey of U.S. special forces operators. The second section shows how the DocuScope taxonomy has improved machine learning efforts, both in terms of accuracy and interpretability, for example in detecting and understanding conspiracy theory discourse over social media. This chapter ultimately calls for humanistic knowledge as a valuable and necessary complement to technical advances in data-centric disciplines like NLP.
Publisher
John Benjamins Publishing Company
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献