Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition-Reference-Cited by-同舟云学术

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

Published:2021-12-20 Issue:24 Volume:10 Page:3172
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Zhan Qingran,Xie Xiang,Hu Chenguang,Zuluaga-Gomez Juan^ORCID,Wang Jing^ORCID,Cheng Haobo

Abstract

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.

Funder

National Nature Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/10/24/3172/pdf

Reference35 articles.

1. Unsupervised Speech Recognition;Baevski;arXiv,2021

2. Phoneme-level articulatory animation in pronunciation training

3. Inferring articulation and recognizing gestures from acoustics with a neural network trained on x‐ray microbeam data

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Acoustic Scene Classification for Bone-Conducted Sound Using Transfer Learning and Feature Fusion;2022 5th International Conference on Information Communication and Signal Processing (ICICSP);2022-11-26

2. Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations;2022 the 6th International Conference on Innovation in Artificial Intelligence (ICIAI);2022-03-04