Thangka Image Caption Generation Method Combining Multi-scale and Multi-level Aggregation-Reference-Cited by-同舟云学术

Thangka Image Caption Generation Method Combining Multi-scale and Multi-level Aggregation

Published:2024-03-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

YUE Chaoyang¹,Hu Wenjin²,Zhang Fujun²,Shi Xinyue²

Affiliation:

1. Hubei Vocational and Technical College

2. Northwest University for Nationalities

Abstract

Abstract Thangka, as a distinctive form of painting in China, plays a crucial role in facilitating a more profound appreciation and comprehension of Thangka through automated image description. Considering the diverse semantic objects and varying scales present in Thangka images, as well as their distinct spatial distribution characteristics, along with the challenge of potential information loss in image key features using Transformer-based encoding layers, this paper proposes a novel approach for generating Thangka descriptions, integrating multi-scale and multi-level aggregation. The proposed method, named Multi-scale and Multi-level Aggregation (MMA), addresses these challenges and enhances the quality of Thangka image description. At the encoding stage, we employ asymmetric convolutions to enhance the spatial information-capturing capability of convolutional layers. Additionally, we utilize a pyramid pooling module to further integrate multi-scale contextual information from both global and local regions of Thangka images, resulting in feature representations that possess rich semantic information. In the decoding stage, a multi-level aggregation network is designed to aggregate features from different encoding layers, thereby improving the utilization of semantic information from higher-level encoding layers and content information from lower-level encoding layers. This effectively addresses the issue of information loss. The experimental results demonstrate that the proposed model achieves promising performance on the Thangka dataset. Compared to the NIC model, it achieves a significant improvement of 26.7% in BLEU-4 and 0.9% in METEOR, while generating descriptions with higher accuracy.

Publisher

Research Square Platform LLC

Reference23 articles.

1. Research on the color style, inheritance, and development of Thangka art;Earl·Zhao;Hundreds of Artists,2022

2. Non-reference quality evaluation method of color Thangka restoration image based on multi-feature;Yuqi Ye;Laser & Optoelectronics Progress,2020

3. No reference quality assessment for Thangka color image based on superpixel;Hu W;Journal of Visual Communication and Image Representation,2019

4. Few shot object detection for headdresses and seats in Thangka Yidam based on ResNet and deformable convolution;Wenjin H;Connection Science,2022

5. Thangka Yidam classification based on DenseNet and SENet;Xue P;Journal of Electronic Imaging,2022