Learning Self-distilled Features for Facial Deepfake Detection Using Visual Foundation Models: General Results and Demographic Analysis-Reference-Cited by-同舟云学术

Learning Self-distilled Features for Facial Deepfake Detection Using Visual Foundation Models: General Results and Demographic Analysis

Published:2024-07-09 Issue:1 Volume:15 Page:682-694
ISSN:2763-7719
Container-title:Journal on Interactive Systems
language:
Short-container-title:JIS

Author:

Cunha Yan Martins Braz Gurevitz^ORCID,Gomes Bruno Rocha^ORCID,Boaro José Matheus C.^ORCID,Moraes Daniel de Sousa^ORCID,Busson Antonio José Grandson^ORCID,Duarte Julio Cesar^ORCID,Colcher Sérgio^ORCID

Abstract

Modern deepfake techniques produce highly realistic false media content with the potential for spreading harmful information, including fake news and incitements to violence. Deepfake detection methods aim to identify and counteract such content by employing machine learning algorithms, focusing mainly on detecting the presence of manipulation using spatial and temporal features. These methods often utilize Foundation Models trained on extensive unlabeled data through self-supervised approaches. This work extends previous research on deepfake detection, focusing on the effectiveness of these models while also considering biases, particularly concerning age, gender, and ethnicity, for ethical analysis. Experiments with DINOv2, a novel Vision Transformer-based Foundation Model, trained using the diverse Deepfake Detection Challenge Dataset, which encompasses several lighting conditions, resolutions, and demographic attributes, demonstrated improved deepfake detection when combined with a CNN classifier, with minimal bias towards these demographic characteristics.

Publisher

Sociedade Brasileira de Computacao - SB

Reference58 articles.

1. Afchar, D., Nozick, V., Yamagishi, J., and Echizen, I. (2018). Mesonet: a compact facial video forgery detection network. In 2018 IEEE international workshop on information forensics and security (WIFS), pages 1–7. IEEE. DOI: https://doi.org/10.1109/WIFS.2018.8630761.

2. Almond Solutions (2021). Why do people post on social media. [link]. Accessed: 09 July 2024.

3. Beaumont-Thomas, B. (2024). Taylor swift deepfake pornography sparks renewed calls for us legislation. [link]. Accessed: 09 July 2024.

4. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. DOI: https://doi.org/10.48550/arXiv.2108.07258.

5. Bonettini, N., Cannas, E. D., Mandelli, S., Bondi, L., Bestagini, P., and Tubaro, S. (2021). Video face manipulation detection through ensemble of cnns. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 5012–5019. DOI: https://doi.org/10.1109/ICPR48806.2021.9412711.