Dual-path transformer-based network with equalization-generation components prediction for flexible vibrational sensor speech enhancement in the time domain-Reference-Cited by-同舟云学术

Dual-path transformer-based network with equalization-generation components prediction for flexible vibrational sensor speech enhancement in the time domain

Published:2022-05 Issue:5 Volume:151 Page:2814-2825
ISSN:0001-4966
Container-title:The Journal of the Acoustical Society of America
language:en
Short-container-title:The Journal of the Acoustical Society of America

Author:

Zheng Changyan¹^ORCID,Xu Liguo¹,Fan Xiaohu¹,Yang Jibin²,Fan Junyi²,Huang Xian³

Affiliation:

1. High-tech Institute, Fan Gong-ting South Street on the 12th, Weifang 261000, China

2. Command and Control Engineering College, Army Engineering University, Nanjing 210007, China

3. Department of Biomedical Engineering, Tianjin University, Tianjin 300072, China

Abstract

The flexible vibrational sensor (FVS) has the potential to become a popular wearable communication device because of its natural noise shielding characteristics and soft materials. However, FVS speech faces a severe loss of frequency components. To improve speech quality, a time-domain neural network model based on the dual-path transformer combined with equalization-generation components prediction (DPT-EGNet) is proposed. More specifically, the DPT-EGNet consists of five modules, namely the pre-processing module, dual-path transformer module, equalization module, generation module, and post-processing module. The dual-path transformer module is leveraged to extract the local and global contextual relationship of long-term speech sequences, which is extremely beneficial for inferring the missing components. The equalization and generation modules are designed according to the characteristics of FVS speech, which further improve the speech quality by simulating the inversion process of the speech distortion. The experimental results demonstrate that the proposed model effectively improves the quality of FVS speech; the average perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and composite measure for overall speech quality (COVL) scores of three males and three females are relatively increased by 64.19%, 29.63%, and 101.37%, which is superior to other baseline models developed in different domains. The proposed model also has significantly lower complexity than the others.

Funder

National Natural Science Foundation of China

Key Research and Development Program of Zhejiang Province

Publisher

Acoustical Society of America (ASA)

Subject

Acoustics and Ultrasonics,Arts and Humanities (miscellaneous)

Link

https://asa.scitation.org/doi/pdf/10.1121/10.0010316

Reference55 articles.

1. Ba, J. L. , Kiros, J. R. , and Hinton, G. E. (2016). “ Layer normalization,” arXiv:1607.06450.

2. Chen, J. , Mao, Q. , and Liu, D. (2020). “ Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation,” arXiv:2007.13975.

3. Cho, K. , Van Merriënboer, B. , Gulcehre, C. , Bahdanau, D. , Bougares, F. , Schwenk, H. , and Bengio, Y. (2014). “ Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv:1406.1078.

4. Conformable amplified lead zirconate titanate sensors with enhanced piezoelectric response for cutaneous pressure monitoring

5. Dang, F. , Chen, H. , and Zhang, P. (2021). “ DPT-FSNet: Dual-path transformer based full-band and sub-band fusion network for speech enhancement,” arXiv:2104.13002.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Online bone/air-conducted speech fusion in the presence of strong narrowband noise;Signal Processing;2024-12

2. DPHT-ANet: Dual-path high-order transformer-style fully attentional network for monaural speech enhancement;Applied Acoustics;2024-09

3. A lightweight speech enhancement network fusing bone- and air-conducted speech;The Journal of the Acoustical Society of America;2024-08-01

4. Restoration of Bone-Conducted Speech With U-Net-Like Model and Energy Distance Loss;IEEE Signal Processing Letters;2024

5. A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2024