Leveraging bilingual terminology to improve machine translation in a CAT environment-Reference-Cited by-同舟云学术

Leveraging bilingual terminology to improve machine translation in a CAT environment

Published:2017-05-30 Issue:5 Volume:23 Page:763-788
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

ARCAN MIHAEL,TURCHI MARCO,TONELLI SARA,BUITELAAR PAUL

Abstract

AbstractThis work focuses on the extraction and integration of automatically aligned bilingual terminology into a Statistical Machine Translation (SMT) system in a Computer Aided Translation scenario. We evaluate the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality. Therefore, we investigate several strategies to extract and align terminology across languages and to integrate it in an SMT system. We compare two terminology injection methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and cache-based model. We test the cache-based model on two different domains (information technology and medical) in English, Italian and German, showing significant improvements ranging from 2.23 to 6.78 BLEU points over a baseline SMT system and from 0.05 to 3.03 compared to the widely-used XML markup approach.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference46 articles.

1. The efficacy of human post-editing for language translation

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A study on intelligent translation of English sentences by a semantic feature extractor;Journal of Intelligent Systems;2024-01-01

2. Optimization of Multi-Strategy Machine Translation System Based on AI Technology;2023 International Conference on Mechatronics, IoT and Industrial Informatics (ICMIII);2023-06

3. TermitUp: Generation and enrichment of linked terminologies;Semantic Web;2022-09-26

4. Application of Computer Aided Translation System in Online Learning;2022 International Conference on Education, Network and Information Technology (ICENIT);2022-09

5. Corpus-based bilingual terminology extraction in the power engineering domain;Terminology;2022-04-07