No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications-Reference-Cited by-同舟云学术

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

Published:2018-09-11 Issue:4 Volume:26 Page:417-430
ISSN:1047-1987
Container-title:Political Analysis
language:en
Short-container-title:Polit. Anal.

Author:

de Vries Erik,Schoonvelde Martijn,Schumacher Gijs^ORCID

Abstract

Automated text analysis allows researchers to analyze large quantities of text. Yet, comparative researchers are presented with a big challenge: across countries people speak different languages. To address this issue, some analysts have suggested using Google Translate to convert all texts into English before starting the analysis (Lucas et al. 2015). But in doing so, do we get lost in translation? This paper evaluates the usefulness of machine translation for bag-of-words models—such as topic models. We use the europarl dataset and compare term-document matrices (TDMs) as well as topic model results from gold standard translated text and machine-translated text. We evaluate results at both the document and the corpus level. We first find TDMs for both text corpora to be highly similar, with minor differences across languages. What is more, we find considerable overlap in the set of features generated from human-translated and machine-translated texts. With regard to LDA topic models, we find topical prevalence and topical content to be highly similar with again only small differences across languages. We conclude that Google Translate is a useful tool for comparative researchers when using bag-of-words text models.

Publisher

Cambridge University Press (CUP)

Subject

Political Science and International Relations,Sociology and Political Science

Reference25 articles.

1. Translation and the internet: evaluating the quality of free online machine translators;Hampshire;Quaderns: revista de traducció,2010

2. Computer-Assisted Text Analysis for Comparative Politics

Cited by 152 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Systematic Review of Oral Vertical Dyskinesia (“Rabbit” Syndrome);Medicina;2024-08-19

2. Pisa Syndrome Secondary to Drugs: A Scope Review;Geriatrics;2024-07-30

3. From global diffusion to local semantics: unpacking the scientization of central banks;Socio-Economic Review;2024-07-29

4. Literacy and Financial Education: Private Providers, Public Certification and Political Preferences;Italian Economic Journal;2024-07-09

5. Your red isn't my red! Connectionist Structuralism and the puzzle of abstract objects;Synthese;2024-06-14