Affiliation:
1. Computer Engineering Department, College of Engineering and Technology, Arab Academy for Science, Technology and Maritime Transport, Alexandria 1029, Egypt
Abstract
Social networks have become deeply integrated into our daily lives, leading to an increase in image sharing across different platforms. Simultaneously, the existence of robust and user-friendly media editors not only facilitates artistic innovation, but also raises concerns regarding the ease of creating misleading media. This highlights the need for developing new advanced techniques for the image copy detection task, which involves evaluating whether photos or videos originate from the same source. This research introduces a novel application of the Vision Transformer (ViT) model to the image copy detection task on the DISC21 dataset. Our approach involves innovative strategic sampling of the extensive DISC21 training set using K-means clustering to achieve a representative subset. Additionally, we employ complex augmentation pipelines applied while training with varying intensities. Our methodology follows the instance discrimination concept, where the Vision Transformer model is used as a classifier to map different augmentations of the same image to the same class. Next, the trained ViT model extracts descriptors of original and manipulated images that subsequently underwent post-processing to reduce dimensionality. Our best-achieving model, tested on a refined query set of 10K augmented images from the DISC21 dataset, attained a state-of-the-art micro-average precision of 0.79, demonstrating the effectiveness and innovation of our approach.
Reference34 articles.
1. A picture paints a thousand lies? The effects and mechanisms of multimodal disinformation and rebuttals disseminated via social media;Hameleers;Political Commun.,2020
2. Altered images: Understanding the influence of unrealistic images and beauty aspirations;MacCallum;Health Care Anal.,2018
3. Pizzi, E., Kordopatis-Zilos, G., Patel, H., Postelnicu, G., Ravindra, S.N., Gupta, A., Papadopoulos, S., Tolias, G., and Douze, M. (2023). The 2023 Video Similarity Dataset and Challenge. arXiv.
4. Blakemore, E. (2019). How Photos Became a Weapon in Stalin’s Great Purge, A&E Television Networks. Canal História.
5. Visual mis/disinformation in journalism and public communications: Current verification practices, challenges, and future opportunities;Thomson;J. Pract.,2022