Acoustic compression in Zoom audio does not compromise voice recognition performance-Reference-Cited by-同舟云学术

Acoustic compression in Zoom audio does not compromise voice recognition performance

Published:2023-10-31 Issue:1 Volume:13 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Perepelytsia Valeriia,Dellwo Volker

Abstract

AbstractHuman voice recognition over telephone channels typically yields lower accuracy when compared to audio recorded in a studio environment with higher quality. Here, we investigated the extent to which audio in video conferencing, subject to various lossy compression mechanisms, affects human voice recognition performance. Voice recognition performance was tested in an old–new recognition task under three audio conditions (telephone, Zoom, studio) across all matched (familiarization and test with same audio condition) and mismatched combinations (familiarization and test with different audio conditions). Participants were familiarized with female voices presented in either studio-quality (N = 22), Zoom-quality (N = 21), or telephone-quality (N = 20) stimuli. Subsequently, all listeners performed an identical voice recognition test containing a balanced stimulus set from all three conditions. Results revealed that voice recognition performance (dʹ) in Zoom audio was not significantly different to studio audio but both in Zoom and studio audio listeners performed significantly better compared to telephone audio. This suggests that signal processing of the speech codec used by Zoom provides equally relevant information in terms of voice recognition compared to studio audio. Interestingly, listeners familiarized with voices via Zoom audio showed a trend towards a better recognition performance in the test (p = 0.056) compared to listeners familiarized with studio audio. We discuss future directions according to which a possible advantage of Zoom audio for voice recognition might be related to some of the speech coding mechanisms used by Zoom.

Funder

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-023-45971-x.pdf

Reference62 articles.

1. Dellwo, V., Pellegrino, E., He, L. & Kathiresan, T. The dynamics of indexical information in speech: Can recognizability be controlled by the speaker? AUC Philol. 2019, 57–75 (2019).

2. Kreiman, J. & Sidtis, D. Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception (Wiley, 2011).

3. Sidtis, D. & Kreiman, J. In the beginning was the familiar voice: Personally familiar voices in the evolutionary and contemporary biology of communication. Integr. Psychol. Behav. Sci. 46, 146–159 (2012).

4. Nygaard, L. C. & Pisoni, D. B. Talker-specific learning in speech perception. Percept. Psychophys. 60, 355–376 (1998).

5. Souza, P., Gehani, N., Wright, R. & McCloy, D. The advantage of knowing the talker. J. Am. Acad. Audiol. 24, 689–700 (2013).

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Impact of Audio Data Compression on Feature Extraction for Vocal Biomarker Detection: Validation Study;JMIR Biomedical Engineering;2024-04-15

2. Exploring the feasibility of remote administration of speech audiometry: A comparative study of conventional and digital methods;DIGITAL HEALTH;2024-01