Research on the Construction of a Bidirectional Neural Network Machine Translation Model Fused with Attention Mechanism-Reference-Cited by-同舟云学术

Research on the Construction of a Bidirectional Neural Network Machine Translation Model Fused with Attention Mechanism

Published:2022-08-19 Issue: Volume:2022 Page:1-11
ISSN:1563-5147
Container-title:Mathematical Problems in Engineering
language:en
Short-container-title:Mathematical Problems in Engineering

Author:

Zuo Guangming¹^ORCID

Affiliation:

1. Huaiying Institute of Technology, Huaian, Jiangsu 223000, China

Abstract

With the development of deep learning, neural machine translation has also been paid attention and developed by researchers. Especially in the application of encoder-decoder in natural language processing, the translation performance has been significantly improved. In 2014, the attention mechanism was used in neural machine translation, the performance of translation was greatly improved, and the interpretability of the model was increased. This research proposes a research idea of sparsemax combined with AAN machine translation model and conducts multiple ablation experiments for experimental verification. This chapter first studies the problem of insufficient sparse normalization when generating target words in the attention mechanism and studies the neural machine translation model incorporating the sparse normalization calculation method. It solves the problem of inductive bias in the data transfer process of related sub-layers in the model. By combining the strategy of sparse normalization, the similarity value of related word vectors can be obtained more accurately when aligning words, which is more convenient for this chapter. Calculate and analyze the specific principles of the model. In addition, when the model faces a large vocabulary in the decoding stage, too many weights of scattered vocabulary vectors are not conducive to the generation of correct target values. After using the sparse normalization strategy, it can reduce the number of inconveniences. The calculation between related words optimizes the classification accuracy of the target vocabulary. In this chapter, aiming at the waste of the transformer’s decoder calculation in the inference stage, the average attention structure is used to replace the attention calculation layer of the first layer of the decoder part of the original model. Each moment is only related to the previous moment, which alleviates the waste of computing resources.

Publisher

Hindawi Limited

Subject

General Engineering,General Mathematics

Link

http://downloads.hindawi.com/journals/mpe/2022/2971876.pdf

Reference23 articles.

1. Coverage embedding models for neural machine translation;H. Mi

2. Deliberation networks: sequence generation beyond one-pass decoding;F. Xia;News in Physiological Sciences,2017

3. Language models are unsupervised multitask learners;A. Radford;OpenAI blog,2019

4. Sequence to sequence learning with neural networks;I. Sutskever;Advances in neural information processing systems,2014

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LegalMind System and the LLM-based Legal Judgment Query System;2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies;2024-03-22

2. Design and Construction of Machine Translation System Based on RNN Model;2023 2nd International Conference on Artificial Intelligence and Intelligent Information Processing (AIIIP);2023-10-27

3. Retracted: Research on the Construction of a Bidirectional Neural Network Machine Translation Model Fused with Attention Mechanism;Mathematical Problems in Engineering;2023-08-02