Predicting manipulated regions in deepfake videos using convolutional vision transformers-Reference-Cited by-同舟云学术

Predicting manipulated regions in deepfake videos using convolutional vision transformers

Published:2024-07-19 Issue:2 Volume:2 Page:1409
ISSN:3029-2786
Container-title:Computing and Artificial Intelligence
language:
Short-container-title:Comput. Artif. Intell.

Author:

Bhandari Mohan,Shrestha Sushant,Karki Utsab,Adhikari Santosh,Gaihre Rajan

Abstract

Deepfake technology, which uses artificial intelligence to create and manipulate realistic synthetic media, poses a serious threat to the trustworthiness and integrity of digital content. Deepfakes can be used to generate, swap, or modify faces in videos, altering the appearance, identity, or expression of individuals. This study presents an approach for deepfake detection, based on a convolutional vision transformer (CViT), a hybrid model that combines convolutional neural networks (CNNs) and vision transformers (ViTs). The proposed study uses a 20-layer CNN to extract learnable features from face images, and a ViT to classify them into real or fake categories. The study also employs MTCNN, a multi-task cascaded network, to detect and align faces in videos, improving the accuracy and efficiency of the face extraction process. The method is assessed using the FaceForensics++ dataset, which comprises 15,800 images sourced from 1600 videos. With an 80:10:10 split ratio, the experimental results show that the proposed method achieves an accuracy of 92.5% and an AUC of 0.91. We use Gradient-Weighted Class Activation Mapping (Grad-CAM) visualization that highlights distinctive image regions used for making a decision. The proposed method demonstrates a high capability of detecting and distinguishing between genuine and manipulated videos, contributing to the enhancement of media authenticity and security.

Publisher

Academic Publishing Pte. Ltd.

Reference13 articles.

1. Karnouskos S. Artificial Intelligence in Digital Media: The Era of Deepfakes. IEEE Transactions on Technology and Society. 2020; 1(3): 138-147. doi: 10.1109/tts.2020.3001312

2. Grobler GD. Narrative strategies in the creation of animated poetry-film [PhD thesis]. University of South Africa; 2021.

3. Wodajo D, Atnafu S, Akhtar Z. Deepfake video detection using generative convolutional vision transformer. Available online: https://arxiv.org/abs/2307.07036 (accessed on 20 May 2024).

4. Heidari A, Jafari Navimipour N, Dag H, et al. Deepfake detection using deep learning methods: A systematic and comprehensive review. WIREs Data Mining and Knowledge Discovery. 2023; 14(2). doi: 10.1002/widm.1520

5. Kearns L, Alam A, Allison J. Synthetic media authentication threats: Detection using a combination of neural network and blockchain technology. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4658121 (accessed on 20 May 2024).