New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding-Reference-Cited by-同舟云学术

New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding

Published:2021-01 Issue: Volume:25 Page:233121652110461
ISSN:2331-2165
Container-title:Trends in Hearing
language:en
Short-container-title:Trends in Hearing

Author:

Schuller Björn¹²³,Baird Alice¹^ORCID,Gebhard Alexander¹,Amiriparian Shahin¹³,Keren Gil¹,Schmitt Maximilian¹,Cummins Nicholas¹⁴

Affiliation:

1. University of Augsburg, Augsburg, Germany

2. GLAM – Group on Language, Audio & Music, Imperial College, London, UK

3. aud EERING GmbH, Germany

4. Department of Biostatistics and Health Informatics, IoPPN, King’s College London, UK

Abstract

Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a single time-point, with each layer varying in characteristics such as location, state, and trait. Currently, integrated machine listening approaches, on the other hand, will mainly recognise only single events. In this context, this contribution aims to provide key insights and approaches, which can be applied in computer audition to achieve the goal of a more holistic intelligent understanding system, as well as identifying challenges in reaching this goal. We firstly summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio interpretation, decomposition, understanding, as well as ontologisation. We then present an agent-based approach for integrating these concepts as a holistic audio understanding system. Based on this, concluding, avenues are given towards reaching the ambitious goal of ‘holistic human-parity’ machine listening abilities.

Funder

The Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation.Bavaria (ZD.B).

Publisher

SAGE Publications

Subject

Speech and Hearing,Otorhinolaryngology

Link

http://journals.sagepub.com/doi/pdf/10.1177/23312165211046135

Reference98 articles.

1. VQA: Visual Question Answering

2. CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms

3. Speaker Diarization: A Review of Recent Research

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Patients' Experience to MRI Examinations—A Systematic Qualitative Review With Meta‐Synthesis;Journal of Magnetic Resonance Imaging;2024-03-27

2. Information Disclosure in the Era of Voice Technology;Journal of Marketing;2023-03-22