Affiliation:
1. National Taiwan University, Hsinchu, Taiwan
2. National Taiwan University, Taipei, Taiwan
Abstract
Most music genre classification approaches extract acoustic features from frames to capture timbre information, leading to the common framework of bag-of-frames analysis. However, time-frequency analysis is also vital for modeling music genres. This article proposes multilevel visual features for extracting spectrogram textures and their temporal variations. A confidence-based late fusion is proposed for combining the acoustic and visual features. The experimental results indicated that the proposed method achieved an accuracy improvement of approximately 14% and 2% in the world's largest benchmark dataset (MASD) and Unique dataset, respectively. In particular, the proposed approach won the Music Information Retrieval Evaluation eXchange (MIREX) music genre classification contests from 2011 to 2013, demonstrating the feasibility and necessity of combining acoustic and visual features for classifying music genres.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Reference44 articles.
1. Time-Frequency Analysis of Musical Instruments
2. Support vector machines using GMM supervectors for speaker verification
3. Chuan Cao and Ming Li. 2009. Thinkits submission for MIREX 2009 audio music classification and similarity tasks. http://www.music-ir.org/mirex/results/2009/abs/CL.pdf. Chuan Cao and Ming Li. 2009. Thinkits submission for MIREX 2009 audio music classification and similarity tasks. http://www.music-ir.org/mirex/results/2009/abs/CL.pdf.
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献