A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation-Reference-Cited by-同舟云学术

A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation

Published:2020-10-21 Issue:10 Volume:11 Page:492
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Wumaier Aishan^ORCID,Xu Cuiyun,Kadeer Zaokere,Liu Wenqi,Wang Yingbo,Haierla Xireaili,Maimaiti Maihemuti,Tian ShengWei,Saimaiti Alimu

Abstract

The recognition and translation of organization names (ONs) is challenging due to the complex structures and high variability involved. ONs consist not only of common generic words but also names, rare words, abbreviations and business and industry jargon. ONs are a sub-class of named entity (NE) phrases, which convey key information in text. As such, the correct translation of ONs is critical for machine translation and cross-lingual information retrieval. The existing Chinese–Uyghur neural machine translation systems have performed poorly when applied to ON translation tasks. As there are no publicly available Chinese–Uyghur ON translation corpora, an ON translation corpus is developed here, which includes 191,641 ON translation pairs. A word segmentation approach involving characterization, tagged characterization, byte pair encoding (BPE) and syllabification is proposed here for ON translation tasks. A recurrent neural network (RNN) attention framework and transformer are adapted here for ON translation tasks with different sequence granularities. The experimental results indicate that the transformer model not only outperforms the RNN attention model but also benefits from the proposed word segmentation approach. In addition, a Chinese–Uyghur ON translation system is developed here to automatically generate new translation pairs. This work significantly improves Chinese–Uyghur ON translation and can be applied to improve Chinese–Uyghur machine translation and cross-lingual information retrieval. It can also easily be extended to other agglutinative languages.

Funder

the National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/11/10/492/pdf

Reference38 articles.

1. Achieving Human Parity on Automatic Chinese to English News Translation;Hassan;arXiv,2018

2. Phonology-Augmented Statistical Framework for Machine Transliteration Using Limited Linguistic Resources

3. Joint Chinese-English Named Entity Recognition and Alignment