Affiliation:
1. Graduate School of Information, Production and Systems Waseda University, 2‐7, Hibikino, Kitakyushu Fukuoka 808‐0135 Japan
Abstract
AbstractAutomatic Singing Label Calibration (ASLC) aims to enhance the labeling accuracy of coarse singing labels through the analysis of raw audio. However, the ASLC model faces limitations due to the challenges and costs associated with generating or augmenting real‐world songs. To address this problem, we propose a novel approach to strengthen limited singing audio using easily available musical instrument audio. Directly using the musical instrument audio as a data augmentation for the singing audio is unreliable due to the distinct differences between vocal and instrumental sounds. Therefore, we employ transfer learning, which allows relevant knowledge to be transferred from one domain to another. In the pre‐training stage, the ASLC model learns to predict the accurate labels from the musical instrument audio. We then consider the vocal as a special musical instrument and fine‐tune the pretrained ASLC model using a singing annotation data set. Experimental results demonstrate that our transfer learning‐based approach outperforms the original ASLC model. By leveraging the readily available musical instrument audio, our method achieves improved performance in enhancing the labeling accuracy of singing audio. © 2024 Institute of Electrical Engineer of Japan and Wiley Periodicals LLC.