Detecting images generated by diffusers-Reference-Cited by-同舟云学术

Detecting images generated by diffusers

Published:2024-07-10 Issue: Volume:10 Page:e2127
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Coccomini Davide Alessandro¹²,Esuli Andrea¹^ORCID,Falchi Fabrizio¹^ORCID,Gennaro Claudio¹,Amato Giuseppe¹

Affiliation:

1. Institute of Information Science and Technologies “Alessandro Faedo”, Italian National Research Council, Pisa, Tuscany, Italy

2. Information Engineering, University of Pisa, Pisa, Tuscany, Italy

Abstract

In recent years, the field of artificial intelligence has witnessed a remarkable surge in the generation of synthetic images, driven by advancements in deep learning techniques. These synthetic images, often created through complex algorithms, closely mimic real photographs, blurring the lines between reality and artificiality. This proliferation of synthetic visuals presents a pressing challenge: how to accurately and reliably distinguish between genuine and generated images. This article, in particular, explores the task of detecting images generated by text-to-image diffusion models, highlighting the challenges and peculiarities of this field. To evaluate this, we consider images generated from captions in the MSCOCO and Wikimedia datasets using two state-of-the-art models: Stable Diffusion and GLIDE. Our experiments show that it is possible to detect the generated images using simple multi-layer perceptrons (MLPs), starting from features extracted by CLIP or RoBERTa, or using traditional convolutional neural networks (CNNs). These latter models achieve remarkable performances in particular when pretrained on large datasets. We also observe that models trained on images generated by Stable Diffusion can occasionally detect images generated by GLIDE, but only on the MSCOCO dataset. However, the reverse is not true. Lastly, we find that incorporating the associated textual information with the images in some cases can lead to a better generalization capability, especially if textual features are closely related to visual ones. We also discovered that the type of subject depicted in the image can significantly impact performance. This work provides insights into the feasibility of detecting generated images and has implications for security and privacy concerns in real-world applications. The code to reproduce our results is available at: https://github.com/davide-coccomini/Detecting-Images-Generated-by-Diffusers.

Funder

SERICS

MUR National Recovery and Resilience Plan funded by the European Union—NextGenerationEU

AI4Media

Publisher

PeerJ

Link

https://peerj.com/articles/cs-2127.pdf

Reference37 articles.

1. Parents and children: distinguishing multimodal deepfakes from natural images;Amoroso,2023

2. Synthbuster: towards detection of diffusion model generated images;Bammey;IEEE Open Journal of Signal Processing,2024

3. The mever deepfake detection service: lessons learnt from developing and deploying in the wild;Baxevanakis,2022

4. Optical flow based cnn for detection of unlearnt deepfake manipulations;Caldelli;Pattern Recognition Letters,2021

5. Xception: deep learning with depthwise separable convolutions;Chollet,2017