Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level-Reference-Cited by-同舟云学术

Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

Published:2022-02-28 Issue:9(115) Volume:1 Page:84-92
ISSN:1729-4061
Container-title:Eastern-European Journal of Enterprise Technologies
language:
Short-container-title:EEJET

Author:

Mamyrbayev Orken^ORCID,Alimhan Keylan^ORCID,Oralbekova Dina^ORCID,Bekarystankyzy Akbayan^ORCID,Zhumazhanov Bagashar^ORCID

Abstract

Ensuring the best quality and performance of modern speech technologies, today, is possible based on the widespread use of machine learning methods. The idea of this project is to study and implement an end-to-end system of automatic speech recognition using machine learning methods, as well as to develop new mathematical models and algorithms for solving the problem of automatic speech recognition for agglutinative (Turkic) languages. Many research papers have shown that deep learning methods make it easier to train automatic speech recognition systems that use an end-to-end approach. This method can also train an automatic speech recognition system directly, that is, without manual work with raw signals. Despite the good recognition quality, this model has some drawbacks. These disadvantages are based on the need for a large amount of data for training. This is a serious problem for low-data languages, especially Turkic languages such as Kazakh and Azerbaijani. To solve this problem, various methods are needed to apply. Some methods are used for end-to-end speech recognition of languages belonging to the group of languages of the same family (agglutinative languages). Method for low-resource languages is transfer learning, and for large resources – multi-task learning. To increase efficiency and quickly solve the problem associated with a limited resource, transfer learning was used for the end-to-end model. The transfer learning method helped to fit a model trained on the Kazakh dataset to the Azerbaijani dataset. Thereby, two language corpora were trained simultaneously. Conducted experiments with two corpora show that transfer learning can reduce the symbol error rate, phoneme error rate (PER), by 14.23 % compared to baseline models (DNN+HMM, WaveNet, and CNC+LM). Therefore, the realized model with the transfer method can be used to recognize other low-resource languages.

Publisher

Private Company Technology Center

Subject

Applied Mathematics,Electrical and Electronic Engineering,Management of Technology and Innovation,Industrial and Manufacturing Engineering,Computer Science Applications,Mechanical Engineering,Energy Engineering and Power Technology,Control and Systems Engineering

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets;Scientific Reports;2024-06-15

2. Comparative Analysis of Models for Neural Machine Speech-to-Text Translation for Turkic State Languages;Lecture Notes in Computer Science;2024

3. Recent Methods and Algorithms in Speech Segmentation Tasks;Communications in Computer and Information Science;2024

4. Salinity Modeling Using Deep Learning with Data Augmentation and Transfer Learning;Water;2023-07-06

5. An end-to-end continuous Kannada ASR system under uncontrolled environment;Multimedia Tools and Applications;2023-06-13