Affiliation:
1. Sociology, University of Florida
Abstract
Abstract
Digitized text has become a popular form of data for sociologists. But text is also the product of many different social and linguistic processes, a topic traditionally examined by sociolinguists in the case of spoken language. This chapter presents a sociolinguistic perspective as a methodological framework when using machine learning to analyze text data. The chapter describes sociolinguistic theories and approaches to studying language while also highlighting ways that this literature tends to differ from traditional sociology. To exemplify this approach, the chapter analyzes a corpus of college admissions essays written by Latinx-identifying applicants using two popular machine-learning-based methods: topic modeling and word embeddings. Similar to variation in spoken language among constituent groups and ethnicities within the panethnic category (e.g., Mexican, Cuban), written language also varies by subgroup in meaningful ways. A sociolinguistic perspective for computational text analysis could spur theoretical and empirical insights.
Reference61 articles.
1. Using search queries to understand health information needs in Africa.;Proceedings of the International AAAI Conference on Web and Social Media
2. Essay content and style are strongly related to household income and SAT scores: Evidence from 60,000 undergraduate applications.;Science Advances,2021
3. Alvero, AJ, Giebel, S., & Pearman, F. (2022a, July 27). Income & campus disparities among Hispanic-identifying UC applicants: Implications for Hispanic serving institution designation. EdArXiv. https://doi.org/10.35542/osf.io/ske5m
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献