Abstract
Abstract
The ability of machines to understand human subjective emotions is an essential link to realize artificial intelligence. How to extract and utilize information from audio signals is still a challenging task. By transforming acoustic signals into time-domain information represented by spectrograms, advanced algorithms in the field of computer vision can be applied to the field of acoustics. In this paper, we propose a Speech Emotion Recognition(SER) system based on Swin-Transformer(Swin). In addition to verifying the feasibility of Swin in SER task, we also compared the effectiveness of various spectrum maps under the same model parameters. Our model is validated on the IEMOCAP dataset and achieves competitive performance.
Subject
Computer Science Applications,History,Education
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献