Affiliation:
1. Com2uS Corporation, Seoul 08506, Republic of Korea
2. Department of Computer Engineering, Myongji University, Yongin 17058, Republic of Korea
Abstract
Audio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly using raw audio data as input for deep learning models in order to perform tagging and eliminate the need for preprocessing. Unfortunately, most current studies of audio auto-tagging cannot effectively reflect the semantic relationships between tags—for instance, the connection between “classical music” and “cello”. In this paper, we propose a novel method that can enhance audio auto-tagging performance via joint embedding. Our model has been carefully designed and architected to recognize the semantic information within the tag domains. In our experiments using the MagnaTagATune (MTAT) dataset, which has high inter-tag correlations, and the Speech Commands dataset, which has no inter-tag correlations, we showed that our approach improves the performance of existing models when there are strong inter-tag correlations.
Funder
National Research Foundation of Korea (NRF) grant funded by the Korea government
2022 Research Fund of Myongji University
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference29 articles.
1. Introduction of decision trees;Quinlan;Mach. Learn.,1986
2. An introduction to kernel and nearest-neighbor nonparametric regression;Altman;Am. Stat.,1992
3. Support-vector networks;Cortes;Mach. Learn.,1995
4. Detection and classification of acoustic scenes and events;Stowell;IEEE Trans. Multimed.,2015
5. Mesaros, A., Heittola, T., Dikmen, O., and Virtanen, T. (2015, January 19–24). Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QL, Australia.