Enhancing the Accuracy of an Image Classification Model Using Cross-Modality Transfer Learning-Reference-Cited by-同舟云学术

Enhancing the Accuracy of an Image Classification Model Using Cross-Modality Transfer Learning

Published:2023-08-02 Issue:15 Volume:12 Page:3316
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Liu Jiaqi¹^ORCID,Chui Kwok Tai¹^ORCID,Lee Lap-Kei¹^ORCID

Affiliation:

1. Department of Electronic Engineering and Computer Science, School of Science and Technology, Hong Kong Metropolitan University, Hong Kong, China

Abstract

Applying deep learning (DL) algorithms for image classification tasks becomes more challenging with insufficient training data. Transfer learning (TL) has been proposed to address these problems. In theory, TL requires only a small amount of knowledge to be transferred to the target task, but traditional transfer learning often requires the presence of the same or similar features in the source and target domains. Cross-modality transfer learning (CMTL) solves this problem by learning knowledge in a source domain completely different from the target domain, often using a source domain with a large amount of data, which helps the model learn more features. Most existing research on CMTL has focused on image-to-image transfer. In this paper, the CMTL problem is formulated from the text domain to the image domain. Our study started by training two separately pre-trained models in the text and image domains to obtain the network structure. The knowledge of the two pre-trained models was transferred via CMTL to obtain a new hybrid model (combining the BERT and BEiT models). Next, GridSearchCV and 5-fold cross-validation were used to identify the most suitable combination of hyperparameters (batch size and learning rate) and optimizers (SGDM and ADAM) for our model. To evaluate their impact, 48 two-tuple hyperparameters and two well-known optimizers were used. The performance evaluation metrics were validation accuracy, F1-score, precision, and recall. The ablation study confirms that the hybrid model enhanced accuracy by 12.8% compared with the original BEiT model. In addition, the results show that these two hyperparameters can significantly impact model performance.

Funder

Katie Shu Sui Pui Charitable Trust—Research Training Fellowship

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/15/3316/pdf

Reference47 articles.

1. A Survey on Deep Learning: Algorithms, Techniques, and Applications;Pouyanfar;ACM Comput. Surv.,2018

2. Investigation of Transfer Learning for Image Classification and Impact on Training Sample Size;Zhu;Chemom. Intell. Lab. Syst.,2021

3. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions;Alzubaidi;J. Big Data,2021

4. Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D.D., and Chen, M. (2014, January 10–12). Medical Image Classification with Convolutional Neural Network. Proceedings of the 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV), Singapore.

5. Opportunities and Challenges for Machine Learning in Rare Diseases;Decherchi;Front. Med.,2021

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Lung tumor cell classification with lightweight mobileNetV2 and attention-based SCAM enhanced faster R-CNN;Evolving Systems;2024-01-23