Enhanced Audio Tagging via Multi- to Single-Modal Teacher-Student Mutual Learning-Reference-Cited by-同舟云学术

Enhanced Audio Tagging via Multi- to Single-Modal Teacher-Student Mutual Learning

Published:2021-05-18 Issue:12 Volume:35 Page:10709-10717
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Yin Yifang,Shrivastava Harsh,Zhang Ying,Liu Zhenguang,Shah Rajiv Ratn,Zimmermann Roger

Abstract

Recognizing ongoing events based on acoustic clues has been a critical yet challenging problem that has attracted significant research attention in recent years. Joint audio-visual analysis can improve the event detection accuracy but may not always be feasible as under many circumstances only audio recordings are available in real-world scenarios. To solve the challenges, we present a novel visual-assisted teacher-student mutual learning framework for robust sound event detection from audio recordings. Our model adopts a multi-modal teacher network based on both acoustic and visual clues, and a single-modal student network based on acoustic clues only. Conventional teacher-student learning performs unsatisfactorily for knowledge transfer from a multi-modality network to a single-modality network. We thus present a mutual learning framework by introducing a single-modal transfer loss and a cross-modal transfer loss to collaboratively learn the audio-visual correlations between the two networks. Our proposed solution takes the advantages of joint audio-visual analysis in training while maximizing the feasibility of the model in use cases. Our extensive experiments on the DCASE17 and the DCASE18 sound event detection datasets show that our proposed method outperforms the state-of-the-art audio tagging approaches.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

2. Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

3. Prototypical Cross-domain Knowledge Transfer for Cervical Dysplasia Visual Inspection;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

4. Cross-Modality Mutual Learning for Enhancing Smart Contract Vulnerability Detection on Bytecode;Proceedings of the ACM Web Conference 2023;2023-04-30

5. Transferring Audio Deepfake Detection Capability across Languages;Proceedings of the ACM Web Conference 2023;2023-04-30