Abstract
AbstractLarge-scale microdata on group identity are critical for studies on identity politics and violence but remain largely unavailable for developing countries. We use personal names to infer religion in South Asia—where religion is a salient social division, and yet, disaggregated data on it are scarce. Existing work predicts religion using a dictionary-based method and, therefore, cannot classify unseen names. We provide character-based machine-learning models that can classify unseen names too with high accuracy. Our models are also much faster and, hence, scalable to large datasets. We explain the classification decisions of one of our models using the layer-wise relevance propagation technique. The character patterns learned by the classifier are rooted in the linguistic origins of names. We apply these to infer the religion of electoral candidates using historical data on Indian elections and observe a trend of declining Muslim representation. Our approach can be used to detect identity groups across the world for whom the underlying names might have different linguistic roots.
Publisher
Cambridge University Press (CUP)
Subject
Political Science and International Relations,Sociology and Political Science
Reference69 articles.
1. Ancona, M. , Ceolini, E. , Öztireli, C. , and Gross, M. . 2018. “Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks.” In 6th International Conference on Learning Representations (ICLR), 1–16. Vancouver; arXiv:1711.06104.
2. Do Traditional Institutions Constrain Female Entrepreneurship? A Field Experiment on Business Training in India
3. Exploring the Myth of Mixed Marriages in India: Evidence from a Nation-wide Survey
4. Do Parties Matter for Ethnic Violence? Evidence From India
5. Natural Language Processing (Almost) from Scratch;Collobert;Journal of Machine Learning Research,2011
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献