Large language models as a substitute for human experts in annotating political text-Reference-Cited by-同舟云学术

Large language models as a substitute for human experts in annotating political text

Published:2024-01 Issue:1 Volume:11 Page:
ISSN:2053-1680
Container-title:Research & Politics
language:en
Short-container-title:Research & Politics

Author:

Heseltine Michael¹^ORCID,Clemm von Hohenberg Bernhard²^ORCID

Affiliation:

1. Amsterdam School of Communication Research, University of Amsterdam, Amsterdam, Netherlands

2. GESIS Leibniz Institute for the Social Sciences, Cologne, Germany

Abstract

Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a ‘hybrid’ coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggest that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.

Publisher

SAGE Publications

Link

http://journals.sagepub.com/doi/pdf/10.1177/20531680241236239

Reference19 articles.

1. Dynamics of Polarizing Rhetoric in Congressional Tweets

2. Synthetic Replacements for Human Survey Data? The Perils of Large Language Models

3. Primary Elections and Candidate Ideology: Out of Step with the Primary Electorate?

4. Desperate Times Call for Desperate Measures: Electoral Competitiveness, Poll Position, and Campaign Negativity

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Accuracy of a large language model in distinguishing anti- and pro-vaccination messages on social media: The case of human papillomavirus vaccination;Preventive Medicine Reports;2024-06

2. Stylometric Analysis of Large Language Model-Generated Commentaries in the Context of Medical Neuroscience;Lecture Notes in Computer Science;2024