Making sense of periodicity glimpses in a prediction-update-loop—A computational model of attentive voice tracking-Reference-Cited by-同舟云学术

Making sense of periodicity glimpses in a prediction-update-loop—A computational model of attentive voice tracking

Published:2022-02 Issue:2 Volume:151 Page:712-737
ISSN:0001-4966
Container-title:The Journal of the Acoustical Society of America
language:en
Short-container-title:The Journal of the Acoustical Society of America

Author:

Luberadzka Joanna¹,Kayser Hendrik¹,Hohmann Volker¹

Affiliation:

1. Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany

Abstract

Humans are able to follow a speaker even in challenging acoustic conditions. The perceptual mechanisms underlying this ability remain unclear. A computational model of attentive voice tracking, consisting of four computational blocks: (1) sparse periodicity-based auditory features (sPAF) extraction, (2) foreground-background segregation, (3) state estimation, and (4) top-down knowledge, is presented. The model connects the theories about auditory glimpses, foreground-background segregation, and Bayesian inference. It is implemented with the sPAF, sequential Monte Carlo sampling, and probabilistic voice models. The model is evaluated by comparing it with the human data obtained in the study by Woods and McDermott [Curr. Biol. 25(17), 2238–2246 (2015)], which measured the ability to track one of two competing voices with time-varying parameters [fundamental frequency ( F0) and formants ( F1, F2)]. Three model versions were tested, which differ in the type of information used for the segregation: version (a) uses the oracle F0, version (b) uses the estimated F0, and version (c) uses the spectral shape derived from the estimated F0 and oracle F1 and F2. Version (a) simulates the optimal human performance in conditions with the largest separation between the voices, version (b) simulates the conditions in which the separation in not sufficient to follow the voices, and version (c) is closest to the human performance for moderate voice separation.

Publisher

Acoustical Society of America (ASA)

Subject

Acoustics and Ultrasonics,Arts and Humanities (miscellaneous)

Link

https://asa.scitation.org/doi/pdf/10.1121/10.0009337

Reference71 articles.

1. With or without you: predictive coding and Bayesian inference in the brain

2. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

3. Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies

4. Pitch discrimination of diotic and dichotic tone complexes: Harmonic resolvability or harmonic number?

5. On the Contribution of Target Audibility to Performance in Spatialized Speech Mixtures

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling;EURASIP Journal on Audio, Speech, and Music Processing;2024-05-22

2. Speech-Aware Binaural DOA Estimation Utilizing Periodicity and Spatial Features in Convolutional Neural Networks;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2024

3. A Two-Stage CNN with Feature Reduction for Speech-Aware Binaural DOA Estimation;2023 31st European Signal Processing Conference (EUSIPCO);2023-09-04