Abstract
AbstractApplications of machine learning (ML) in industry and natural sciences yielded some of the most impactful innovations of the last decade (for instance, artificial intelligence, gene prediction or search engines) and changed the everyday-life of many people. From a methodological perspective, we can differentiate between unsupervised machine learning (UML) and supervised machine learning (SML). While SML uses labeled data as input to train algorithms in order to predict outcomes of unlabeled data, UML detects underlying patterns in unlabeled observations by exploiting the statistical properties of the data. The possibilities of ML for analyzing large datasets are slowly finding their way into the social sciences; yet, it lacks systematic introductions into the epistemologically alien subject. I present applications of some of the most common methods for SML (i.e., logistic regression) and UML (i.e., topic models). A practical example offers social scientists a “how-to” description for utilizing both. With regard to SML, the case is made by predicting gender of a large dataset of sociologists. The proposed approach is based on open-source data and outperforms a popular commercial application (genderize.io). Utilizing the predicted gender in topic models reveals the stark thematic differences between male and female scholars that have been widely overlooked in the literature. By applying ML, hence, the empirical results shed new light on the longstanding question of gender-specific biases in academia.
Publisher
Springer Science and Business Media LLC
Subject
Sociology and Political Science,Social Psychology
Reference68 articles.
1. Abbott, Andrew. 2001. Chaos of Disciplines. Chicago: University of Chicago Press.
2. Abramo, Giovanni, Ciriaco Andrea D’Angelo and Flavia Di Costa. 2019. A Gender Analysis of Top Scientists’ Collaboration Behavior: Evidence from Italy. Scientometrics 120(2):405–418.
3. Ahlquist, John S., and Christian Breunig. 2012. Model-Based Clustering and Typologies in the Social Sciences. Political Analysis 20(1):92–112.
4. Anderson, Ashton, Dan McFarland and Dan Jurafsky. 2012. Towards a Computational History of the ACL: 1980–2008. In Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries, ACL ’12, 13–21. Stroudsburg, PA, USA: Association for Computational Linguistics.
5. Barone, Carlo. 2011. Some Things Never Change: Gender Segregation in Higher Education across Eight Nations and Three Decades. Sociology of Education 84(2):157–176.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献