Transfer Learning Using Musical Instrument Audio for Improving Automatic Singing Label Calibration-Reference-Cited by-同舟云学术

Transfer Learning Using Musical Instrument Audio for Improving Automatic Singing Label Calibration

Published:2024-02-11 Issue:5 Volume:19 Page:707-715
ISSN:1931-4973
Container-title:IEEJ Transactions on Electrical and Electronic Engineering
language:en
Short-container-title:IEEJ Transactions Elec Engng

Author:

Fu Xiao¹,Rui Xijian¹,Deng Hangyu¹,Hu Jinglu¹

Affiliation:

1. Graduate School of Information, Production and Systems Waseda University, 2‐7, Hibikino, Kitakyushu Fukuoka 808‐0135 Japan

Abstract

AbstractAutomatic Singing Label Calibration (ASLC) aims to enhance the labeling accuracy of coarse singing labels through the analysis of raw audio. However, the ASLC model faces limitations due to the challenges and costs associated with generating or augmenting real‐world songs. To address this problem, we propose a novel approach to strengthen limited singing audio using easily available musical instrument audio. Directly using the musical instrument audio as a data augmentation for the singing audio is unreliable due to the distinct differences between vocal and instrumental sounds. Therefore, we employ transfer learning, which allows relevant knowledge to be transferred from one domain to another. In the pre‐training stage, the ASLC model learns to predict the accurate labels from the musical instrument audio. We then consider the vocal as a special musical instrument and fine‐tune the pretrained ASLC model using a singing annotation data set. Experimental results demonstrate that our transfer learning‐based approach outperforms the original ASLC model. By leveraging the readily available musical instrument audio, our method achieves improved performance in enhancing the labeling accuracy of singing audio. © 2024 Institute of Electrical Engineer of Japan and Wiley Periodicals LLC.

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/tee.24018

Reference21 articles.

1. Automatic Singing Transcription Based on Encoder-decoder Recurrent Neural Networks with a Weakly-supervised Attention Mechanism

2. FuZ‐S SuL.Hierarchical classification networks for singing voice segmentation and transcription.Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR 2019) 900–907.2019.

3. Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music

4. On the Preparation and Validation of a Large-Scale Dataset of Singing Transcription