Lattice
BLEU
oracles in machine translation
-
Published:2013-12
Issue:4
Volume:10
Page:1-29
-
ISSN:1550-4875
-
Container-title:ACM Transactions on Speech and Language Processing
-
language:en
-
Short-container-title:ACM Trans. Speech Lang. Process.
Author:
Sokolov Artem1,
Wisniewski Guillaume2,
Yvon Franccois2
Affiliation:
1. Universität Heidelberg, Heidelberg, Germany
2. Université Paris Sud and LIMSI--CNRS, Orsay CEDEX, France
Abstract
The search space of Phrase-Based Statistical Machine Translation (PBSMT) systems can be represented as a directed acyclic graph (lattice). By exploring this search space, it is possible to analyze and understand the failures of PBSMT systems. Indeed, useful diagnoses can be obtained by computing the so-called
oracle
hypotheses, which are hypotheses in the search space that have the highest quality score. For standard SMT metrics, this problem is, however, NP-hard and can only be solved approximately. In this work, we present two new methods for efficiently computing oracles on lattices: the first one is based on a linear approximation of the corpus bleu score and is solved using generic shortest distance algorithms; the second one relies on an Integer Linear Programming (ILP) formulation of the oracle decoding that incorporates count clipping constraints. It can either be solved directly using a standard ILP solver or using Lagrangian relaxation techniques. These new decoders are evaluated and compared with several alternatives from the literature for three language pairs, using lattices produced by two PBSMT systems.
Funder
OSEO under the Quaero program
Publisher
Association for Computing Machinery (ACM)
Subject
Computational Mathematics,Computer Science (miscellaneous)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Learning to translate queries for CLIR;Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval;2014-07-03