Affiliation:
1. Communication University of China, China
Abstract
With the continuous advancement of deepfake technology, there has been a surge in the creation of realistic fake videos. Unfortunately, the malicious utilization of deepfake poses a significant threat to societal morality and political security. Therefore, numerous researchers have proposed various deepfake detection methods. However, traditional deepfake approaches tend to focus on specific forgery features, such as artifacts or inconsistent actions, which can be vulnerable to specialized countermeasures. Recent studies show an intrinsic correlation between facial and audio cues, which can be exploited for deepfake detection. To address these challenges and enhance the robustness and generalization of deepfake detection algorithms, we propose a novel joint audio-visual deepfake detection model named AVA-CL, which is capable of detecting deepfakes in both audio and visual domains. Furthermore, exploiting the inherent correlation and consistency between audio and visual enhances the effectiveness of deepfake detection significantly. Through extensive experiments, we demonstrate that our proposed AVA-CL model outperforms many state-of-the-art (SOTA) methods with superior robustness and generalization capabilities. This research presents a promising approach for deepfake detection and reducing the harm caused by malicious use.
Funder
National Key Research and Development Program
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献