Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling-Reference-Cited by-同舟云学术

Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling

Published:2024-05-22 Issue:1 Volume:2024 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Luberadzka Joanna,Kayser Hendrik,Lücke Jörg,Hohmann Volker

Abstract

AbstractSelective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space. The difficulty is related to the fact that in a complex auditory scene, the information about the tracked object is not available in a clean form. The more cluttered the sound mixture, the more time and frequency regions where the object of interest is masked by other sound sources. How does the auditory system recognize and follow acoustic objects based on this fragmentary information? Numerous studies highlight the crucial role of top-down processing in this task. Having in mind both auditory modeling and signal processing applications, we investigated how computational methods with and without top-down processing deal with increasing sparsity of the auditory features in the task of estimating instantaneous voice states, defined as a combination of three parameters: fundamental frequency F0 and formant frequencies F1 and F2. We found that the benefit from top-down processing grows with increasing sparseness of the auditory data.

Funder

Deutsche Forschungsgemeinschaft

Carl von Ossietzky Universität Oldenburg

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13636-024-00350-w.pdf

Reference37 articles.

1. M. Cooke, A glimpsing model of speech perception in noise. J. Acoust. Soc. Am. 119(3), 1562–1573 (2006)

2. E. Schoenmaker, S. van de Par. Intelligibility for binaural speech with discarded low-SNR speech components, in Physiology, psychoacoustics and cognition in normal and impaired hearing (Springer International Publishing, 2016), pp. 73–81

3. R.L. Gregory, Perceptions as hypotheses. Philos. Trans. R. Soc. Lond. B Biol. Sci. 290(1038), 181–197 (1980)

4. J. Luberadzka, H. Kayser, V. Hohmann, Making sense of periodicity glimpses in a prediction-update-loop–a computational model of attentive voice tracking. J Acoust. Soc. Am. 151(2), 712–737 (2022)

5. K.J. Woods, J.H. McDermott, Attentive tracking of sound sources. Curr. Biol. 25(17), 2238–2246 (2015)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Temporal neural dynamics of understanding communicative intentions from speech prosody;NeuroImage;2024-10