Author:
Zong Zhaorong,Hong Changchun
Abstract
Abstract
Parallel corpora are of great value in the field of machine translation and cross-language information retrieval. Benefiting from the development of machine learning and deep learning, the technology of the construction of corpus evolves from vocabulary alignment, phrase alignment to chunk alignment. The high quality of automatic bilingual chunks alignment in corpus plays an important role in the performance improvement of machine translation systems, especially in computer-aided translation systems. In the study, the degree of adhesion and relaxation is used to measure the tightness and looseness of the inter-word connection when a chunk is identified, which can be expressed by a mathematical mode. The task of chunk alignment in the construction of a parallel corpus can be described as the three steps: input bilingual sentences, segment chunks, and semantic alignment. At present, most algorithms are based on statistical methods, and the output alignment results are machine-oriented.
Subject
General Physics and Astronomy
Reference13 articles.
1. Changchun Research on the Information Processing Model of Source Language of Parallel Corpus[C];Hong,2018
2. A hierarchical phrase-based model for statistical machine translation [A];Chiang
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Fine Tuning Language Models: A Tale of Two Low-Resource Languages;Data Intelligence;2024-07-01
2. Algorithm of Creating the “Uzbek-English Aligner” Program;2023 8th International Conference on Computer Science and Engineering (UBMK);2023-09-13
3. Corpus-Based Research on the Use of Foreign Language Chunks;2021 5th Annual International Conference on Data Science and Business Analytics (ICDSBA);2021-09