Comparison of semi-supervised deep learning algorithms for audio classification-Reference-Cited by-同舟云学术

Comparison of semi-supervised deep learning algorithms for audio classification

Published:2022-09-19 Issue:1 Volume:2022 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Cances Léo,Labbé Etienne,Pellegrini Thomas^ORCID

Abstract

AbstractIn this article, we adapted five recent SSL methods to the task of audio classification. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two collaborative neural networks. The three other algorithms, called MixMatch (MM), ReMixMatch (RMM), and FixMatch (FM), are single-model methods that rely primarily on data augmentation strategies. Using the Wide-ResNet-28-2 architecture in all our experiments, 10% of labeled data and the remaining 90% as unlabeled data for training, we first compare the error rates of the five methods on three standard benchmark audio datasets: Environmental Sound Classification (ESC-10), UrbanSound8K (UBS8K), and Google Speech Commands (GSC). In all but one cases, MM, RMM, and FM outperformed MT and DCT significantly, MM and RMM being the best methods in most experiments. On UBS8K and GSC, MM achieved 18.02% and 3.25% error rate (ER), respectively, outperforming models trained with 100% of the available labeled data, which reached 23.29% and 4.94%, respectively. RMM achieved the best results on ESC-10 (12.00% ER), followed by FM which reached 13.33%. Second, we explored adding the mixup augmentation, used in MM and RMM, to DCT, MT, and FM. In almost all cases, mixup brought consistent gains. For instance, on GSC, FM reached 4.44% and 3.31% ER without and with mixup. Our PyTorch code will be made available upon paper acceptance at https://github.com/Labbeti/SSLH.

Funder

Agence Nationale de la Recherche

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics

Link

https://link.springer.com/content/pdf/10.1186/s13636-022-00255-6.pdf

Reference46 articles.

1. M. Sajjadi, M. Javanmardi, T. Tasdizen, in Advances in Neural Information Processing Systems, vol. 29, ed. by D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett. Regularization with stochastic transformations and perturbations for deep semi-supervised learning (Curran Associates, Inc., 2016), Barcelona, pp. 1163–1171.

2. S. Laine, T. Aila, in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. Temporal ensembling for semi-supervised learning (OpenReview.net, 2017). https://openreview.net/forum?id=BJ6oOfqge. Accessed 11 Aug 2022

3. T. Miyato, S. -i. Maeda, M. Koyama, S. Ishii, Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning (2018). http://arxiv.org/abs/1704.03976. Accessed 11 Aug 2022

4. Y. Grandvalet, Y. Bengio, in Advances in Neural Information Processing Systems, vol. 17, ed. by L. Saul, Y. Weiss, and L. Bottou. Semi-supervised learning by entropy minimization (MIT Press, 2005), Vancouver, pp. 529–536.

5. D. -H. Lee, Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proc. ICML 2013 Workshop: Challenges in Representation Learning (WREPL), Atlanta.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Analysis and interpretation of joint source separation and sound event detection in domestic environments;PLOS ONE;2024-07-05

2. Drug delivery system tailoring via metal-organic framework property prediction using machine learning: A disregarded approach;Materials Today Communications;2024-03

3. An automatic thaat and raga identification system using CNN-based models;Innovations in Systems and Software Engineering;2023-10-17

4. Uncertainty in Semi-Supervised Audio Classification – A Novel Extension for FixMatch;2023 31st European Signal Processing Conference (EUSIPCO);2023-09-04

5. Cosmopolite Sound Monitoring (CoSMo): A Study of Urban Sound Event Detection Systems Generalizing to Multiple Cities;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04