A Multimodal Music Emotion Classification Method Based on Multifeature Combined Network Classifier-Reference-Cited by-同舟云学术

A Multimodal Music Emotion Classification Method Based on Multifeature Combined Network Classifier

Published:2020-08-01 Issue: Volume:2020 Page:1-11
ISSN:1024-123X
Container-title:Mathematical Problems in Engineering
language:en
Short-container-title:Mathematical Problems in Engineering

Author:

Chen Changfeng¹^ORCID,Li Qiang¹^ORCID

Affiliation:

1. Institute of Intelligent and Software Technology, Hangzhou Danzi University, Hangzhou 310018, China

Abstract

Aiming at the shortcomings of single network classification model, this paper applies CNN-LSTM (convolutional neural networks-long short-term memory) combined network in the field of music emotion classification and proposes a multifeature combined network classifier based on CNN-LSTM which combines 2D (two-dimensional) feature input through CNN-LSTM and 1D (single-dimensional) feature input through DNN (deep neural networks) to make up for the deficiencies of original single feature models. The model uses multiple convolution kernels in CNN for 2D feature extraction, BiLSTM (bidirectional LSTM) for serialization processing and is used, respectively, for audio and lyrics single-modal emotion classification output. In the audio feature extraction, music audio is finely divided and the human voice is separated to obtain pure background sound clips; the spectrogram and LLDs (Low Level Descriptors) are extracted therefrom. In the lyrics feature extraction, the chi-squared test vector and word embedding extracted by Word2vec are, respectively, used as the feature representation of the lyrics. Combining the two types of heterogeneous features selected by audio and lyrics through the classification model can improve the classification performance. In order to fuse the emotional information of the two modals of music audio and lyrics, this paper proposes a multimodal ensemble learning method based on stacking, which is different from existing feature-level and decision-level fusion methods, the method avoids information loss caused by direct dimensionality reduction, and the original features are converted into label results for fusion, effectively solving the problem of feature heterogeneity. Experiments on million song dataset show that the audio classification accuracy of the multifeature combined network classifier in this paper reaches 68%, and the lyrics classification accuracy reaches 74%. The average classification accuracy of the multimodal reaches 78%, which is significantly improved compared with the single-modal.

Publisher

Hindawi Limited

Subject

General Engineering,General Mathematics

Link

http://downloads.hindawi.com/journals/mpe/2020/4606027.pdf

Reference12 articles.

1. Machine Recognition of Music Emotion

2. Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods

3. Improvised emotion and genre detection for songs through signal processing and genetic algorithm

4. Exploiting online music tags for music emotion classification

5. Automatic Emotion-Based Music Classification for Supporting Intelligent IoT Applications

Cited by 35 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BLNN:a muscular and tall architecture for emotion prediction in music;Soft Computing;2024-07-18

2. Verse1-Chorus-Verse2 Structure: A Stacked Ensemble Approach for Enhanced Music Emotion Recognition;Applied Sciences;2024-07-01

3. MMD-MII Model: A Multilayered Analysis and Multimodal Integration Interaction Approach Revolutionizing Music Emotion Classification;International Journal of Computational Intelligence Systems;2024-04-22

4. Automatic music mood classification using multi-modal attention framework;Engineering Applications of Artificial Intelligence;2024-02

5. A Study of Emotion Classification of Music Lyrics using LSTM Networks;2024 5th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI);2024-01-18