Ensemble of convolutional neural networks to improve animal audio classification-Reference-Cited by-同舟云学术

Ensemble of convolutional neural networks to improve animal audio classification

Published:2020-05-26 Issue:1 Volume:2020 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Nanni Loris,Costa Yandre M. G.,Aguiar Rafael L.,Mangolin Rafael B.,Brahnam Sheryl,Silla Carlos N.

Abstract

AbstractIn this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of producing better classification accuracy than other state-of-the-art approaches without ad hoc parameter optimization. We present an ensemble of classifiers that performs competitively on different types of animal audio datasets using the same set of classifiers and parameter settings. To produce this general-purpose ensemble, we ran a large number of experiments that fine-tuned pretrained convolutional neural networks (CNNs) for different audio classification tasks (bird, bat, and whale audio datasets). Six different CNNs were tested, compared, and combined. Moreover, a further CNN, trained from scratch, was tested and combined with the fine-tuned CNNs. To the best of our knowledge, this is the largest study on CNNs in animal audio classification. Our results show that several CNNs can be fine-tuned and fused for robust and generalizable audio classification. Finally, the ensemble of CNNs is combined with handcrafted texture descriptors obtained from spectrograms for further improvement of performance. The MATLAB code used in our experiments will be provided to other researchers for future comparisons at https://github.com/LorisNanni.

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics

Link

https://link.springer.com/content/pdf/10.1186/s13636-020-00175-3.pdf

Reference65 articles.

1. M. A. Acevedo, C. J. Corrada-Bravo, H. Corrada-Bravo, L. J. Villanueva-Rivera, T. M. Aide, Automated classification of bird and amphibian calls using machine learning: a comparison of methods. Ecol. Inform.4(4), 206–214 (2009).

2. J. Andén, S. Mallat, Deep scattering spectrum. IEEE Trans. Signal Process. 62(16), 4114–4128 (2014). https://doi.org/10.1109/TSP.2014.2326991.

3. T. Berg, P. N. Belhumeur, in 2013 IEEE Conference on Computer Vision and Pattern Recognition. Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation, (2013), pp. 955–962. https://doi.org/10.1109/CVPR.2013.128.

4. S. Branson, G. Van Horn, S. Belongie, P. Perona, Bird species categorization using pose normalized deep convolutional nets. arXiv preprint (2014). arXiv:1406.2952.

5. J. Bruna, S. Mallat, Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell.35(8), 1872–1886 (2013).

Cited by 47 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MDF-Net: A multi-view dual-attention fusion network for efficient bird sound classification;Applied Acoustics;2024-11

2. Using Deep Learning to Classify Environmental Sounds in the Habitat of Western Black-Crested Gibbons;Diversity;2024-08-22

3. BAT-CNN: BirdNet Assisted Training for CNN;2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI);2024-06-21

4. Teacher-Student Framework for Polyphonic Semi-supervised Sound Event Detection: Survey and Empirical Analysis;ACM Transactions on Intelligent Systems and Technology;2024-04-23

5. Capsule network-based deep ensemble transfer learning for multimodal sentiment analysis;Expert Systems with Applications;2024-04