Abstract
AbstractIn this paper, we propose a score-informed source separation framework based on non-negative matrix factorization (NMF) and dynamic time warping (DTW) that suits for both offline and online systems. The proposed framework is composed of three stages: training, alignment, and separation. In the training stage, the score is encoded as a sequence of individual occurrences and unique combinations of notes denoted as score units. Then, we proposed a NMF-based signal model where the basis functions for each score unit are represented as a weighted combination of spectral patterns for each note and instrument in the score obtained from a trained a priori over-completed dictionary. In the alignment stage, the time-varying gains are estimated at frame level by computing the projection of each score unit basis function over the captured audio signal. Then, under the assumption that only a score unit is active at a time, we propose an online DTW scheme to synchronize the score information with the performance. Finally, in the separation stage, the obtained gains are refined using local low-rank NMF and the separated sources are obtained using a soft-filter strategy. The framework has been evaluated and compared with other state-of-the-art methods for single channel source separation of small ensembles and large orchestra ensembles obtaining reliable results in terms of SDR and SIR. Finally, our method has been evaluated in the specific task of acoustic minus one, and some demos are presented.
Publisher
Springer Science and Business Media LLC
Subject
Electrical and Electronic Engineering,Acoustics and Ultrasonics
Reference80 articles.
1. F. J. Canadas-Quesada, D. Fitzgerald, P. Vera-Candeas, N. Ruiz-Reyes, in Proceedings of the 20th International Conference on Digital Audio Effects (DAFx-17). Harmonic-percussive sound separation using rhythmic information from non-negative matrix factorization in single-channel music recordings (Edinburgh, 2017), pp. 276–282.
2. J. -L. Durrieu, G. Richard, B. David, C. Fevotte, Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans. Audio Speech Lang. Process.18(3), 564–575 (2010). https://doi.org/10.1109/TASL.2010.2041114.
3. J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process.22(3), 727–739 (2014). https://doi.org/10.1109/TASLP.2014.2303576.
4. J. J. Carabias-Orti, J. Nikunen, T. Virtanen, P. Vera-Candeas, Multichannel blind sound source separation using spatial covariance model with level and time differences and nonnegative matrix factorization. IEEE/ACM Trans. Audio Speech Lang Process.26(9), 1512–1527 (2018). https://doi.org/10.1109/TASLP.2018.2830105.
5. L. Wang, H. Ding, F. Yin, Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J. Audio Speech Music. Process.2010(1), 1–13 (2010). https://doi.org/10.1155/2010/797962.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献