Abstract
AbstractSupervised machine learning is an increasingly popular tool for analyzing large political text corpora. The main disadvantage of supervised machine learning is the need for thousands of manually annotated training data points. This issue is particularly important in the social sciences where most new research questions require new training data for a new task tailored to the specific research question. This paper analyses how deep transfer learning can help address this challenge by accumulating “prior knowledge” in language models. Models like BERT can learn statistical language patterns through pre-training (“language knowledge”), and reliance on task-specific data can be reduced by training on universal tasks like natural language inference (NLI; “task knowledge”). We demonstrate the benefits of transfer learning on a wide range of eight tasks. Across these eight tasks, our BERT-NLI model fine-tuned on 100 to 2,500 texts performs on average 10.7 to 18.3 percentage points better than classical models without transfer learning. Our study indicates that BERT-NLI fine-tuned on 500 texts achieves similar performance as classical models trained on around 5,000 texts. Moreover, we show that transfer learning works particularly well on imbalanced data. We conclude by discussing limitations of transfer learning and by outlining new opportunities for political science research.
Funder
Nederlandse Organisatie voor Wetenschappelijk Onderzoek
Heinrich Böll Stiftung
Publisher
Cambridge University Press (CUP)
Subject
Political Science and International Relations,Sociology and Political Science
Reference46 articles.
1. Computer-Assisted Text Analysis for Comparative Politics
2. Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France
3. Cross-Domain Topic Classification for Political Texts;Osnabrügge;Political Analysis,2021
4. Wang, S. , Fang, H. , Khabsa, M. , Mao, H. , and Ma, H. . 2021. “Entailment as Few-Shot Learner.” Preprint, arXiv:2104.14690 [Cs], April. http://arxiv.org/abs/2104.14690.
Cited by
23 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献