Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language-Reference-Cited by-同舟云学术

Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language

Published:2022-01-04 Issue: Volume: Page:
ISSN:1544-5976
Container-title:Journal of Web Engineering
language:
Short-container-title:JWE

Author:

Park Hosung,Kim Changmin,Son Hyunsoo,Seo Soonshin^ORCID,Kim Ji-Hwan^ORCID

Abstract

In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model (DNN/HMM)-based speech recognition system has driven dramatic improvement in this area. However, it is difficult for non-experts to develop speech recognition for new applications. End-to-end approaches have simplified speech recognition system into a single-network architecture. These approaches can develop speech recognition system that does not require expert knowledge. In this paper, we propose hybrid CTC-attention network as end-to-end speech recognition model for Korean language. This model effectively utilizes a CTC objective function during attention model training. This approach improves the performance in terms of speech recognition accuracy as well as training speed. In most languages, end-to-end speech recognition uses characters as output labels. However, for Korean, character-based end-to-end speech recognition is not an efficient approach because Korean language has 11,172 possible numbers of characters. The number is relatively large compared to other languages. For example, English has 26 characters, and Japanese has 50 characters. To address this problem, we utilize Korean 49 graphemes as output labels. Experimental result shows 10.02% character error rate (CER) when 740 hours of Korean training data are used.

Publisher

River Publishers

Subject

Computer Networks and Communications,Information Systems,Software

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Retraction Note: COVID-19 Forecast and Bank Credit Decision Model Based on BiLSTM-Attention Network;International Journal of Computational Intelligence Systems;2024-03-27

2. A Deep Learning-Based Multimodal Resource Reconstruction Scheme for Digital Enterprise Management;Journal of Circuits, Systems and Computers;2023-03-03

3. Cognitive Hexagon-Controlled Intelligent Speech Interaction System;IEEE Transactions on Cognitive and Developmental Systems;2022-12