A Systematic Comparison of Various Statistical Alignment Models-Reference-Cited by-同舟云学术

A Systematic Comparison of Various Statistical Alignment Models

Published:2003-03 Issue:1 Volume:29 Page:19-51
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Och Franz Josef¹,Ney Hermann²

Affiliation:

1. University of Southern California, Information Science Institute (USC/ISI), 4029 Via Marina, Suite 1001, Marina del Rey, CA 90292.

2. RWTH Aachen, Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen-University of Technology, D-52056 Aachen, Germany.

Abstract

We present and compare various methods for computing word alignments using statistical or heuristic models. We consider the five alignment models presented in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model, smoothing techniques, and refinements. These statistical models are compared with two heuristic models based on the Dice coefficient. We present different methods for combining word alignments to perform a symmetrization of directed statistical alignment models. As evaluation criterion, we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We evaluate the models on the German-English Verbmobil task and the French-English Hansards task. We perform a detailed analysis of various design decisions of our statistical alignment system and evaluate these on training corpora of various sizes. An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models. In the Appendix, we present an efficient training algorithm for the alignment models presented.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/089120103321337421

Reference9 articles.

1. Measures of the Amount of Ecologic Association Between Species

Cited by 882 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning;2024 International Conference on Asian Language Processing (IALP);2024-08-04

2. Mismatching-aware unsupervised translation quality estimation for low-resource languages;Language Resources and Evaluation;2024-05-05

3. Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies;ACM Transactions on Interactive Intelligent Systems;2024-03-02

4. Transliteration Characteristics in Romanized Assamese Language Social Media Text and Machine Transliteration;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-02-08

5. Word Sense Disambiguation applied to Assamese-Hindi Bilingual Statistical Machine Translation;Engineering, Technology & Applied Science Research;2024-02-08