Efficient FPGA implementation for sound source separation using direction-informed multichannel non-negative matrix factorization-Reference-Cited by-同舟云学术

Efficient FPGA implementation for sound source separation using direction-informed multichannel non-negative matrix factorization

Published:2024-03-06 Issue:9 Volume:80 Page:13411-13433
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Diel Philipp,Muñoz-Montoro Antonio J.,Carabias-Orti Julio J.,Ranilla Jose

Abstract

AbstractSound source separation (SSS) is a fundamental problem in audio signal processing, aiming to recover individual audio sources from a given mixture. A promising approach is multichannel non-negative matrix factorization (MNMF), which employs a Gaussian probabilistic model encoding both magnitude correlations and phase differences between channels through spatial covariance matrices (SCM). In this work, we present a dedicated hardware architecture implemented on field programmable gate arrays (FPGAs) for efficient SSS using MNMF-based techniques. A novel decorrelation constraint is presented to facilitate the factorization of the SCM signal model, tailored to the challenges of multichannel source separation. The performance of this FPGA-based approach is comprehensively evaluated, taking advantage of the flexibility and computational capabilities of FPGAs to create an efficient real-time source separation framework. Our experimental results demonstrate consistent, high-quality results in terms of sound separation.

Funder

Ministerio de Ciencia e Innovación,Spain

HORIZON EUROPE Framework Programme

Gobierno del Principado de Asturias

RWTH Aachen University

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11227-024-05945-w.pdf

Reference27 articles.

1. Tylka JG, Choueiri EY (2020) Fundamentals of a parametric method for virtual navigation within an array of ambisonics microphones. J Audio Eng Soc 68(3):120–137

2. Pezzoli M, Borra F, Antonacci F, Tubaro S, Sarti A (2020) A parametric approach to virtual miking for sources of arbitrary directivity. IEEE/ACM Trans Audio Speech Lang Process 28:2333–2348

3. FitzGerald D, Cranitch M, Coyle E (2005) Non-negative tensor factorisation for sound source separation. In: IEEE Irish Signals and Systems Conference, vol 2005. IEEE, pp 8–12

4. Ozerov A, Fevotte C (2010) Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Lang Process 18(3):550–563. https://doi.org/10.1109/TASL.2009.2031510

5. Sawada H, Kameoka H, Araki S, Ueda N (2013) Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans Audio Speech Lang Process 21(5):971–982. https://doi.org/10.1109/TASL.2013.2239990