A Comparative Analysis of LSTM and Transformer-based Automatic Speech Recognition Techniques-Reference-Cited by-同舟云学术

A Comparative Analysis of LSTM and Transformer-based Automatic Speech Recognition Techniques

Published:2024-08-12 Issue: Volume:5 Page:272-276
ISSN:2960-2238
Container-title:Transactions on Computer Science and Intelligent Systems Research
language:
Short-container-title:TCSISR

Author:

Zhang Ruijing

Abstract

Automatic Speech Recognition (ASR) is a technology that leverages artificial intelligence to convert spoken language into written text. It utilizes machine learning algorithms, specifically deep learning models, to analyze audio signals and extract linguistic features. This technology has revolutionized the way that people interact with voice-enabled devices, enabling efficient and accurate transcription of human speech in various applications, including voice assistants, captioning, and transcription services. Among previous works for ASR, Long Short-Term Memory (LSTM) networks and Transformer-based methods are typical solutions towards effective ASR. In this paper, the author focuses on an in-depth exploration of the progression and comparative analysis of deep learning innovations within the ASR domain. This work starts with a foundational historical perspective, mapping the evolution from pioneering ASR systems to the current benchmarks: LSTM networks and Transformer-based models. The study meticulously evaluates these technologies, dissecting their strengths, weaknesses, and the potential they hold for future advancements in ASR.

Publisher

Warwick Evans Publishing

Reference13 articles.

1. Rabiner, Lawrence, and Biinghwang Juang. An introduction to hidden Markov models. IEEE ASSP magazine, 1986, 3(1): 4-16.

2. Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer. A maximum likelihood approach to continuous speech recognition. IEEE transactions on pattern analysis and machine intelligence, 1983, 2: 179-190.

3. Hinton, Geoffrey, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine, 2012, 29(6): 82-97.

4. Van Houdt, Greg, Carlos Mosquera, and Gonzalo Nápoles. A review on the long short-term memory model. Artificial Intelligence Review, 2020, 53(8): 5929-5955.

5. Zeng, Taiyao. Deep Learning in Automatic Speech Recognition (ASR): A Review. In 2022 7th International Conference on Modern Management and Education Technology, 2022: 173-179.