An Unsupervised Two-Talker Speech Separation System Based on CASA-Reference-Cited by-同舟云学术

An Unsupervised Two-Talker Speech Separation System Based on CASA

Published:2018-03-14 Issue:07 Volume:32 Page:1858002
ISSN:0218-0014
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
language:en
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.

Author:

Li Hongyan¹,Wang Yue¹,Zhao Rongrong¹,Zhang Xueying¹

Affiliation:

1. School of Information Engineering Taiyuan University of Technology, Jinzhong, Shanxi 030600, P. R. China

Abstract

On the basis of the theory about blind separation of monaural speech based on computational auditory scene analysis (CASA), a two-talker speech separation system combining CASA and speaker recognition was proposed to separate speech from other speech interferences in this paper. First, a tandem algorithm is used to organize voiced speech, then based on the clustering of gammatone frequency cepstral coefficients (GFCCs), an object function is established to recognize the speaker, and the best group is achieved through exhaustive search or beam search, so that voiced speech is organized sequentially. Second, unvoiced segments are generated by estimating onset/offset, and then unvoiced–voiced (U–V) segments and unvoiced–unvoiced (U–U) segments are separated respectively. The U–V segments are managed via the binary mask of the separated voiced speech, while the U–V segments are separated evenly. So far the unvoiced segments are separated. The simulation and performance evaluation verify the feasibility and effectiveness of the proposed algorithm.

Funder

Natural Science Foundation of Shanxi Province

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218001418580028

Reference16 articles.

1. Computational auditory scene analysis

2. Stereo hidden Markov modeling for noise robust speech recognition

3. C. Darwin , in Computational Auditory Scene Analysis: Principles, Algorithms and Applications (Wiley-IEEE Press, 2006), p. 13.

4. A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation

5. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MAVAR-SE: Multi-scale Audio-Visual Association Representation Network for End-to-End Speaker Extraction;Lecture Notes in Computer Science;2024

2. CASA BASED SUPERVISED SINGLE CHANNEL SPEAKER INDEPENDENT SPEECH SEPARATION;JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES;2019-12-28

3. Linear Predictive Coefficients-Based Feature to Identify Top-Seven Spoken Languages;International Journal of Pattern Recognition and Artificial Intelligence;2019-09-23

4. Variance based time-frequency mask estimation for unsupervised speech enhancement;Multimedia Tools and Applications;2019-07-25