Transformer-based Automatic Music Mood Classification Using Multi-modal Framework-Reference-Cited by-同舟云学术

Transformer-based Automatic Music Mood Classification Using Multi-modal Framework

Published:2023-04-03 Issue:1 Volume:23 Page:e02
ISSN:1666-6038
Container-title:Journal of Computer Science and Technology
language:
Short-container-title:JCS&T

Author:

A. S Sujeesha,Rajan Rajeev^ORCID

Abstract

The mood is a psychological state of feeling that is related to internal emotions and affect, which is how emotions are expressed outwardly. According to studies, music affects our moods, and we are also inclined to choose a theme based on our current moods. Audio-based techniques can achieve promising results, but lyrics also give relevant information about the moods of a song which may not be present in the audio part. So a multi-modal with both textual features and acoustic features can provide enhanced accuracy. Sequential networks such as long short-term memory networks (LSTM) and gated recurrent unit networks (GRU) are widely used in the most state-of-the-art natural language processing (NLP) models. A transformer model uses self-attention to compute representations of its inputs and outputs, unlike recurrent unit networks (RNNs) that use sequences and transformers that can parallelize over input positions during training. In this work, we proposed a multi-modal music mood classification system based on transformers and compared the system's performance using a bi-directional GRU (Bi-GRU)-based system with and without attention. The performance is also analyzed for other state-of-the-art approaches. The proposed transformer-based model acquired higher accuracy than the Bi-GRU-based multi-modal system withsingle-layer attention by providing a maximum accuracy of 77.94\%.

Publisher

Universidad Nacional de La Plata

Subject

Artificial Intelligence,Computer Science Applications,Computer Vision and Pattern Recognition,Hardware and Architecture,Computer Science (miscellaneous),Software

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Oktoechos classification in liturgical music using self attention based-stacked bi-directional networks;Multimedia Tools and Applications;2024-06-25

2. Automatic music mood classification using multi-modal attention framework;Engineering Applications of Artificial Intelligence;2024-02