Affiliation:
1. Hubei University of Education, Wuhan, China
Abstract
Previous translation models like statistical machine translation (SMT), rule-based machine translation (RBMT), hybrid machine translation (HMT), and neural machine translation (NMT) have reached their performance bottleneck. The new Transformer-based machine translation model has become the favorite choice for English language translation. For instance, Google’s BERT translation model organizes the Transformer module into bidirectional encoder representations. It is aware of the users’ search intentions as well as the material that the search engine has indexed. It does not need to evaluate previous searches to comprehend what people mean, unlike RankBrain. BERT comprehends words, sentences, and complete information in the same way that we do. It achieves remarkable translation quality improvement over the other state-of-the-art benchmarks. It demonstrates the great potential of the Transformer model. The Transformer-based translation model mainly improves the performance at the cost of growing model sizes and complexity, usually requiring million-scale parameters. It is hard for the traditional computing systems to cope with the growing memory and computation requirements. However, the latest computers can easily run this model without any lag. The biggest challenge of applying the Transformer model is to deploy these models efficiently onto real-time or embedded devices. In this work, we propose a quantization scheme to reduce the parameter and computation complexity. It is of great importance to promote the usage of the Transformer model. Our experiment results show that the original Transformer model in 32 bit floating-point can be quantized to only 8 bits to 12 bits with only negligible translation quality loss. However, due to the perfect transformation of the block part, this quality loss part can easily be managed by the users. Meanwhile, our algorithm achieves
to
compression ratio, which is helpful to save the required complexity and energy during the inference phase.
Funder
Hubei University of Education
Subject
Computer Networks and Communications,Computer Science Applications
Reference15 articles.
1. Long Short-Term Memory
2. Attention is all you need;A. Vaswani;Advances in Neural Information Processing Systems,2017
3. October. Recurrent continuous translation models;N. Kalchbrenner
4. Stanford neural machine translation systems for spoken language domains;M. T. Luong
5. Sequence to sequence learning with neural networks;I. Sutskever;Advances in Neural Information Processing Systems,2014
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Translation English to Punjabi: A Concise Review of Significant Approaches;2024 International Conference on Computational Intelligence and Computing Applications (ICCICA);2024-05-23
2. Translation Systems: A Synoptic Survey of Deep Learning Approaches and Techniques;2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO);2024-03-14
3. Enhancing Neural Text Detector Robustness with μAttacking and RR-Training;Electronics;2023-04-21