Affiliation:
1. Faculty of Technical Sciences, University of Novi Sad, 21000 Novi Sad, Serbia
2. Faculty of Sciences, University of Novi Sad, 21000 Novi Sad, Serbia
Abstract
Human–machine interaction covers a range of applications in which machines should understand humans’ commands and predict their behavior. Humans commonly change their mood over time, which affects the way we interact, particularly by changing speech style and facial expressions. As interaction requires quick decisions, low latency is critical for real-time processing. Edge devices, strategically placed near the data source, minimize processing time, enabling real-time decision-making. Edge computing allows us to process data locally, thus reducing the need to send sensitive information further through the network. Despite the wide adoption of audio-only, video-only, and multimodal emotion recognition systems, there is a research gap in terms of analyzing lightweight models and solving privacy challenges to improve model performance. This motivated us to develop a privacy-preserving, lightweight, CNN-based (CNNs are frequently used for processing audio and video modalities) audiovisual emotion recognition model, deployable on constrained edge devices. The model is further paired with a federated learning protocol to preserve the privacy of local clients on edge devices and improve detection accuracy. The results show that the adoption of federated learning improved classification accuracy by ~2%, as well as that the proposed federated learning-based model provides competitive performance compared to other baseline audiovisual emotion recognition models.
Funder
European Union’s Horizon 2020 research
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献