An Ensemble of Convolutional Neural Networks for Audio Classification-Reference-Cited by-同舟云学术

An Ensemble of Convolutional Neural Networks for Audio Classification

Published:2021-06-22 Issue:13 Volume:11 Page:5796
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Nanni Loris^ORCID,Maguolo Gianluca,Brahnam Sheryl^ORCID,Paci Michelangelo^ORCID

Abstract

Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in different habitats or the type of interfering noise in urban environments. Since environmental audio datasets are often limited in size, a robust model able to perform well across different datasets is of strong research interest. In this paper, ensembles of classifiers are combined that exploit six data augmentation techniques and four signal representations for retraining five pre-trained convolutional neural networks (CNNs); these ensembles are tested on three freely available environmental audio benchmark datasets: (i) bird calls, (ii) cat sounds, and (iii) the Environmental Sound Classification (ESC-50) database for identifying sources of noise in environments. To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification. The best-performing ensembles are compared and shown to either outperform or perform comparatively to the best methods reported in the literature on these datasets, including on the challenging ESC-50 dataset. We obtained a 97% accuracy on the bird dataset, 90.51% on the cat dataset, and 88.65% on ESC-50 using different approaches. In addition, the same ensemble model trained on the three datasets managed to reach the same results on the bird and cat datasets while losing only 0.1% on ESC-50. Thus, we have managed to create an off-the-shelf ensemble that can be trained on different datasets and reach performances competitive with the state of the art.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/13/5796/pdf

Reference59 articles.

1. Machine Learning in Automatic Speech Recognition: A Survey

2. Combining visual and acoustic features for audio classification tasks

3. Multimodal Biometric Person Authentication : A Review

Cited by 65 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HM–GDM: Hybrid Measures and Graph-Dependent Modeling for Environmental Sound Classification;International Journal of Computational Intelligence Systems;2024-08-12

2. Efficient Deep Neural Network Compression for Environmental Sound Classification on Microcontroller Units;Turkish Journal of Electrical Engineering and Computer Sciences;2024-07-26

3. Detecting Selected Instruments in the Sound Signal;Applied Sciences;2024-07-20

4. BAT-CNN: BirdNet Assisted Training for CNN;2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI);2024-06-21

5. Responding to challenge call for machine learning model development in diagnosing respiratory disease sounds;Journal of Edge Computing;2024-05-21