Linear Methods for Efficient and Fast Separation of Two Sources Recorded with a Single Microphone-Reference-Cited by-同舟云学术

Linear Methods for Efficient and Fast Separation of Two Sources Recorded with a Single Microphone

Published:2015-10 Issue:10 Volume:27 Page:2231-2259
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Bhargava Saurabh¹,Blättler Florian¹,Kollmorgen Sepp¹,Liu Shih-Chii¹,Hahnloser Richard H. R.¹

Affiliation:

1. Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, 8057, Switzerland

Abstract

This letter addresses the problem of separating two speakers from a single microphone recording. Three linear methods are tested for source separation, all of which operate directly on sound spectrograms: (1) eigenmode analysis of covariance difference to identify spectro-temporal features associated with large variance for one source and small variance for the other source; (2) maximum likelihood demixing in which the mixture is modeled as the sum of two gaussian signals and maximum likelihood is used to identify the most likely sources; and (3) suppression-regression, in which autoregressive models are trained to reproduce one source and suppress the other. These linear approaches are tested on the problem of separating a known male from a known female speaker. The performance of these algorithms is assessed in terms of the residual error of estimated source spectrograms, waveform signal-to-noise ratio, and perceptual evaluation of speech quality scores. This work shows that the algorithms compare favorably to nonlinear approaches such as nonnegative sparse coding in terms of simplicity, performance, and suitability for real-time implementations, and they provide benchmark solutions for monaural source separation tasks.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00776

Reference41 articles.

1. Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones

2. Enhancement of speech corrupted by acoustic noise

3. Auditory Scene Analysis

4. Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation

5. Separation of an Instantaneous Mixture of Gaussian Autoregressive Sources by the Exact Maximum Likelihood Approach

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sparse NMF based speech enhancement with bases update;International Journal of Speech Technology;2017-05-09