Design of New Word Retrieval Algorithm for Chinese-English Bilingual Parallel Corpus-Reference-Cited by-同舟云学术

Design of New Word Retrieval Algorithm for Chinese-English Bilingual Parallel Corpus

Published:2022-03-26 Issue: Volume:2022 Page:1-9
ISSN:1563-5147
Container-title:Mathematical Problems in Engineering
language:en
Short-container-title:Mathematical Problems in Engineering

Author:

Zhang Liting¹^ORCID

Affiliation:

1. Faculty of Humanities, Gansu Agricultural University, Lanzhou 730070, China

Abstract

Natural language processing is an important direction in the field of computer science and artificial intelligence. It can realize various theories and methods of effective communication between humans and computers using natural language. Machine learning is a branch of natural language processing research, which is based on a large-scale English-Chinese database. Due to the relatively poor alignment corpus of English and Chinese bilingual sentences containing unknown words, machine translation is unprofessional and unbalanced, which is the problem studied in this paper. The purpose of this paper is to design and implement a length-based system for sentence alignment between English and Chinese bilingual texts. The research content of this paper is mainly divided into the following parts. First, the evaluation function of bilingual sentence alignment is designed, and on this basis, the bilingual sentence alignment algorithm based on the length and the optimal sentence pair sequence search algorithm is designed. In this paper, China National Knowledge Infrastructure (CNKI) is selected as an English-Chinese bilingual candidate website and English-Chinese bilingual web pages are downloaded. After analyzing the downloaded pages, nontext content such as page tags is removed, and bilingual text information is stored so as to establish an English-Chinese bilingual corpus based on segment alignment and retain English-Chinese bilingual keywords in the web pages. Second, extract the dictionary from the software StarDict, analyze the original dictionary format, and turn it into a custom dictionary format, which is convenient and better to use the double-sentence sentence alignment system, which is conducive to expanding the number of dictionaries and increasing the professionalism of vocabulary. Finally, we extract the stems of English words from the established corpus to simplify the complexity of English word processing, reduce the noise caused by the conversion of word parts of speech, and improve the operation efficiency. A bilingual sentence alignment system based on length is implemented. Finally, the system parameters are adjusted for comparative experiments to test the system performance.

Funder

Gansu Agricultural University

Publisher

Hindawi Limited

Subject

General Engineering,General Mathematics

Link

http://downloads.hindawi.com/journals/mpe/2022/6399375.pdf

Reference34 articles.

1. A beam search decoder for phrase-based statistical machine translation models：training manual;P. Koehn;USC Information Sciences Institute,2004

2. Mining translations of oov terms from the web through crosslingual query expansion;Z. Ying

3. Web-based query translation for English-Chinese CLIR;C. Y. Lu;Computational Linguisticsand Chinese Language Processing,2008

4. Discovering parallel text from the world wide web;C. Jisong

5. A research on bilingual dictionary based sentence alignment for Chinese English parallel corpus;Y. Muyun;High Technology Letters,2002

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integration and Innovation of Artificial Intelligence and Traditional English Translation Methods;Applied Mathematics and Nonlinear Sciences;2024-01-01

2. Construction of English Numerical Intelligence Text Translation Data Corpus in Colleges and Universities;Applied Mathematics and Nonlinear Sciences;2024-01-01

3. Differential Analysis of Stylistic Features in Chinese-English Interpretation Based on Natural Language Processing;Applied Mathematics and Nonlinear Sciences;2023-12-16