Can machine translation systems be evaluated by the crowd alone-Reference-Cited by-同舟云学术

Can machine translation systems be evaluated by the crowd alone

Published:2015-09-16 Issue:1 Volume:23 Page:3-30
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

GRAHAM YVETTE,BALDWIN TIMOTHY,MOFFAT ALISTAIR,ZOBEL JUSTIN

Abstract

AbstractCrowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement levels may not be high. In this paper, we present a new methodology for crowd-sourcing human assessments of translation quality, which allows individual workers to develop their own individual assessment strategy. Agreement with experts is no longer required, and a worker is deemed reliable if they are consistent relative to their own previous work. Individual translations are assessed in isolation from all others in the form of direct estimates of translation quality. This allows more meaningful statistics to be computed for systems and enables significance to be determined on smaller sets of assessments. We demonstrate the methodology's feasibility in large-scale human evaluation through replication of the human evaluation component of Workshop on Statistical Machine Translation shared translation task for two language pairs, Spanish-to-English and English-to-Spanish. Results for measurement based solely on crowd-sourced assessments show system rankings in line with those of the original evaluation. Comparison of results produced by the relative preference approach and the direct estimate method described here demonstrate that the direct estimate method has a substantially increased ability to identify significant differences between translation systems.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference66 articles.

1. Randomized Significance Tests in Machine Translation

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reference-free review-based product question answering evaluation via distant contrastive learning;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

2. Multi-view fusion for universal translation quality estimation;Information Fusion;2024-02

3. Large Language Models Evaluate Machine Translation via Polishing;2023 6th International Conference on Algorithms, Computing and Artificial Intelligence;2023-12-22

4. Video-Captioning Evaluation Metric for Segments (VEMS): A Metric for Segment-level Evaluation of Video Captions with Weighted Frames;Multimedia Tools and Applications;2023-10-28

5. StructureTester: Automatic Machine Translation Testing Based on Variation Feature Vector;2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS);2023-10-22