You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection-Reference-Cited by-同舟云学术

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Published:2022-03-24 Issue:7 Volume:12 Page:3293
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Venkatesh Satvik^ORCID,Moffat David^ORCID,Miranda Eduardo Reck^ORCID

Abstract

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster.

Funder

Engineering and Physical Sciences Research Council

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/7/3293/pdf

Reference58 articles.

1. Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion

2. Task 4: Large-Scale Weakly Supervised Sound Event Detection for Smart Cars http://dcase.community/challenge2017/task-large-scale-sound-event-detection

3. Towards the automatic classification of avian flight calls for bioacoustic monitoring;Salamon;PLoS ONE,2016

4. A Deep Learning Approach to Intelligent Drum Mixing With the Wave-U-Net

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sound event detection in traffic scenes based on graph convolutional network to obtain multi-modal information;Complex & Intelligent Systems;2024-05-16

2. Light Gated Multi Mini-Patch Extractor for Audio Classification;2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW);2024-04-14

3. Revolutionizing Healthcare: NLP, Deep Learning, and WSN Solutions for Managing the COVID-19 Crisis;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-01-05

4. A safety-oriented framework for sound event detection in driving scenarios;Applied Acoustics;2024-01

5. A Systematic Review of Rare Events Detection Across Modalities Using Machine Learning and Deep Learning;IEEE Access;2024