Performance of machine translators in translating French medical research abstracts to English: A comparative study of DeepL, Google Translate, and CUBBITT-Reference-Cited by-同舟云学术

Performance of machine translators in translating French medical research abstracts to English: A comparative study of DeepL, Google Translate, and CUBBITT

Published:2024-02-01 Issue:2 Volume:19 Page:e0297183
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Sebo Paul^ORCID,de Lucia Sylvain

Abstract

Background Non-English speaking researchers may find it difficult to write articles in English and may be tempted to use machine translators (MTs) to facilitate their task. We compared the performance of DeepL, Google Translate, and CUBBITT for the translation of abstracts from French to English. Methods We selected ten abstracts published in 2021 in two high-impact bilingual medical journals (CMAJ and Canadian Family Physician) and used nine metrics of Recall-Oriented Understudy for Gisting Evaluation (ROUGE-1 recall/precision/F1-score, ROUGE-2 recall/precision/F1-score, and ROUGE-L recall/precision/F1-score) to evaluate the accuracy of the translation (scores ranging from zero to one [= maximum]). We also used the fluency score assigned by ten raters to evaluate the stylistic quality of the translation (ranging from ten [= incomprehensible] to fifty [= flawless English]). We used Kruskal-Wallis tests to compare the medians between the three MTs. For the human evaluation, we also examined the original English text. Results Differences in medians were not statistically significant for the nine metrics of ROUGE (medians: min-max = 0.5246–0.7392 for DeepL, 0.4634–0.7200 for Google Translate, 0.4815–0.7316 for CUBBITT, all p-values > 0.10). For the human evaluation, CUBBITT tended to score higher than DeepL, Google Translate, and the original English text (median = 43 for CUBBITT, vs. 39, 38, and 40, respectively, p-value = 0.003). Conclusion The three MTs performed similarly when tested with ROUGE, but CUBBITT was slightly better than the other two using human evaluation. Although we only included abstracts and did not evaluate the time required for post-editing, we believe that French-speaking researchers could use DeepL, Google Translate, or CUBBITT when writing articles in English.

Publisher

Public Library of Science (PLoS)

Reference43 articles.

1. Disadvantages in preparing and publishing scientific papers caused by the dominance of the English language in science: The case of Colombian researchers in biological sciences.;V. Ramírez-Castañeda;PLOS ONE,2020

2. Machine translation in society: insights from UK users.;LN Vieira;Lang Resour Eval.,2023

3. Development of machine translation technology for assisting health communication: A systematic review.;KN Dew;J Biomed Inform,2018

4. Advances in natural language processing;J Hirschberg;Science,2015

5. Analysis on the Recent Trends in Machine Translation;R. Song;Highlights Sci Eng Technol.,2022

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Barriers and enablers encountered by elite athletes during preconception and pregnancy: a mixed-methods systematic review;British Journal of Sports Medicine;2024-08-28

2. Analyzing the diffusion of feminist discourses on Chinese social media: A case study of the 2022 Tangshan restaurant attack;PLOS ONE;2024-08-23