A multimodal turn in Digital Humanities. Using contrastive machine learning models to explore, enrich, and analyze digital visual historical collections-Reference-Cited by-同舟云学术

A multimodal turn in Digital Humanities. Using contrastive machine learning models to explore, enrich, and analyze digital visual historical collections

Published:2023-03-15 Issue:3 Volume:38 Page:1267-1280
ISSN:2055-7671
Container-title:Digital Scholarship in the Humanities
language:en
Short-container-title:

Author:

Smits Thomas¹^ORCID,Wevers Melvin²^ORCID

Affiliation:

1. Faculty of Arts, University of Antwerp , Antwerp, Belgium

2. Department of History, Faculty of Humanities, University of Amsterdam , Amsterdam, The Netherlands

Abstract

Abstract Until recently, most research in the Digital Humanities (DH) was monomodal, meaning that the object of analysis was either textual or visual. Seeking to integrate multimodality theory into the DH, this article demonstrates that recently developed multimodal deep learning models, such as Contrastive Language Image Pre-training (CLIP), offer new possibilities to explore and analyze image–text combinations at scale. These models, which are trained on image and text pairs, can be applied to a wide range of text-to-image, image-to-image, and image-to-text prediction tasks. Moreover, multimodal models show high accuracy in zero-shot classification, i.e. predicting unseen categories across heterogeneous datasets. Based on three exploratory case studies, we argue that this zero-shot capability opens up the way for a multimodal turn in DH research. Moreover, multimodal models allow scholars to move past the artificial separation of text and images that was dominant in the field and analyze multimodal meaning at scale. However, we also need to be aware of the specific (historical) bias of multimodal deep learning that stems from biases in the training data used to train these models.

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Linguistics and Language,Language and Linguistics,Information Systems

Link

https://academic.oup.com/dsh/article-pdf/38/3/1267/51309490/fqad008.pdf

Reference50 articles.

1. The Family in English Children's Literature

2. Distant viewing: analyzing large visual corpora;Arnold;Digital Scholarship in the Humanities,2019

3. Le message photographique;Barthes;Communications,1961

4. Text and Image

5. The digitization of newspaper archives: opportunities and challenges for historians;Bingham;Twentieth Century British History,2010

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Revisiting the Kahn collection: multimodal artificial intelligence and visual patterns of presence and absence in the Archives de la Planète , 1909–1931;Visual Studies;2024-08-12

2. Datafication of audiovisual archives: from practice mapping to a thinking model;Journal of Documentation;2024-03-05

3. Rethinking multimodal corpora from the perspective of Peircean semiotics;Frontiers in Communication;2024-02-12

4. Prompt Me a Dataset: An Investigation of Text-Image Prompting for Historical Image Dataset Creation Using Foundation Models;Lecture Notes in Computer Science;2024

5. Enriching Cultural Heritage through the Integration of Art and Digital Technologies;Social Sciences;2023-10-26