AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval-Reference-Cited by-同舟云学术

AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval

Published:2022-04-18 Issue:8 Volume:11 Page:1275
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Jabbar Muhammad Shahid^ORCID,Shin Jitae^ORCID,Cho Jun-Dong^ORCID

Abstract

Artificial intelligence research in natural language processing in the context of poetry struggles with the recognition of holistic content such as poetic symbolism, metaphor, and other fine-grained attributes. Given these challenges, multi-modal image–poetry reasoning and retrieval remain largely unexplored. Our recent accessibility study indicates that poetry is an effective medium to convey visual artwork attributes for improved artwork appreciation of people with visual impairments. We, therefore, introduce a deep learning approach for the automatic retrieval of poetry suitable to the input images. The recent state-of-the-art CLIP provides a way for multi-modal visual and text features matched using cosine similarity. However, it lacks shared cross-modality attention features to model fine-grained relationships. The proposed approach in this work takes advantage of strong pre-training of the CLIP model and overcomes its limitations by introducing shared attention parameters to better model the fine-grained relationship between both modalities. We test and compare our proposed approach using the expertly annotated MiltiM-Poem dataset, which is considered the largest public image–poetry pair dataset for English poetry. The proposed approach aims to solve the problems of image-based attribute recognition and automatic retrieval for fine-grained poetic verses. The test results reflect that the shared attention parameters alleviate fine-grained attribute recognition, and the proposed approach is a significant step towards automatic multi-modal retrieval for improved artwork appreciation of people with visual impairments.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/8/1275/pdf

Reference49 articles.

1. A Study of Multi-Sensory Experience and Color Recognition in Visual Arts Appreciation of People with Visual Impairment

2. Sound Coding Color to Improve Artwork Appreciation by People with Visual Impairments

3. Cross-Modal Correspondence between Vision and Olfaction: The Color of Smells

4. Thermal Interaction for Improving Tactile Artwork Depth and Color-Depth Appreciation for Visually Impaired People

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Artificial intelligence foundation and pre-trained models: Fundamentals, applications, opportunities, and social impacts;Simulation Modelling Practice and Theory;2023-07