The Photographic Pipeline of Machine Vision; or, Machine Vision's Latent Photographic Theory-Reference-Cited by-同舟云学术

The Photographic Pipeline of Machine Vision; or, Machine Vision's Latent Photographic Theory

Published:2023-10-01 Issue:1-2 Volume:1 Page:
ISSN:2834-703X
Container-title:Critical AI
language:en
Short-container-title:

Author:

Malevé Nicolas,Sluis Katrina

Abstract

Abstract Despite computer vision's extensive mobilization of cameras, photographers, and viewing subjects, photography's place in machine vision remains undertheorized. This article illuminates an operative theory of photography that exists in a latent form, embedded in the tools, practices, and discourses of machine vision research and enabling the methodological imperatives of dataset production. Focusing on the development of the canonical object recognition dataset ImageNet, the article analyzes how the dataset pipeline translates the radical polysemy of the photographic image into a stable and transparent form of data that can be portrayed as a proxy of human vision. Reflecting on the prominence of the photographic snapshot in machine vision discourse, the article traces the path that made this popular cultural practice amenable to the dataset. Following the evolution from nineteenth-century scientific photography to the acquisition of massive sets of online photos, the article shows how dataset creators inherit and transform a form of “instrumental realism,” a photographic enterprise that aims to establish a generalized look from contingent instances in the pursuit of statistical truth. The article concludes with a reflection on how the latent photographic theory of machine vision we have advanced relates to the large image models built for generative AI today.

Publisher

Duke University Press

Reference45 articles.

1. Baio, Andy . 2022. “Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion's Image Generator.” Waxy, August30. https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator/.

2. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale;FAccT ’23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency,2023

3. Large Image Datasets: A Pyrrhic Win for Computer Vision?;2021 IEEE Winter Conference on Applications of Computer Vision,2021

4. Multimodal Datasets: Misogyny, Pornography, and Malignant Stereotypes,2021