Abstract
Being a significant repository of Buddhist imagery, Thangka images are valuable historical materials of Tibetan studies, which covers many domains such as Tibetan history, politics, culture, social life and even traditional medicine and astronomy. Thangka cultural element images are the essence of Thangka images. Hence Thangka cultural element images classification is one of the most important work of knowledge representation and mining in the field of Thangka, and is the foundation of digital protection of Thangka images. However, due to the limited quantity, high complexity and the intricate textures of Thangka images, the classification of Thangka images is limited to a small number of categories and coarse granularity. Thus a novel fusion texture feature dual-branch Thangka cultural elements classification model based on the attention mechanism and self-supervised contrastive learning has been proposed in this paper. Specifically, to address the issue of insufficient labeled samples and improve the classification performance, this method utilizes a large amount of unlabeled irrelevant data to pre-train the feature extractor through self-supervised learning. During the fine-tuning stage of the downstream task, a dual-branch feature extraction structure incorporating texture features has been designed, and MS-Triplet Attetnion proposed by us is used for the integration of important features. Additionally, to address the problem of sample imbalance and the existence of a large number of difficult samples in the Thangka cultural element data set, the Gradient Harmonizing Mechanism Loss has been adopted, and it has been improved by introducing a self designed adaptive mechanism. The experimental results on Thangka cultural elements dataset prove the superiority of the proposed method over the state-of-the-art methods.The source code of our proposed algorithm and the related datasets is available at https://github.com/WiniTang/MS-BiCLR.