Affiliation:
1. Lanzhou University, Lanzhou, China
2. School of Information Science & Engineering, Lanzhou University, Lanzhou, China
Abstract
Depression is an illness that involves emotional and mental health. Currently, depression detection through interviews is the most popular way. With the advancement of natural language processing and sentiment analysis, automated interview-based depression detection is strongly supported. However, current multimodal depression detection models fail to adequately capture the fine-grained features of depressive behaviors, making it difficult for the models to accurately characterize the subtle changes in depressive symptoms. To address this problem, we propose a Multi Fine-Grained Fusion Network (MFFNet). The core idea of this model is to extract and fuse the information of different scale feature pairs through a Multi-Scale Fastformer (MSfastformer), and then use the Recurrent Pyramid Model to integrate the features of different resolutions, promoting the interaction of multi-level information. Through the interaction of multi-scale and multi-resolution features, it aims to explore richer feature representations. To validate the effectiveness of our proposed MFFNet model, we conduct experiments on two depression interview datasets. The experimental results show that the MFFNet model performs better in depression detection compared to other benchmark multimodal models.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities
Publisher
Association for Computing Machinery (ACM)