Identifying Malignant Breast Ultrasound Images Using ViT-Patch-Reference-Cited by-同舟云学术

Identifying Malignant Breast Ultrasound Images Using ViT-Patch

Published:2023-03-09 Issue:6 Volume:13 Page:3489
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Feng Hao¹,Yang Bo¹^ORCID,Wang Jingwen¹,Liu Mingzhe²^ORCID,Yin Lirong³,Zheng Wenfeng¹^ORCID,Yin Zhengtong⁴,Liu Chao⁵^ORCID

Affiliation:

1. School of Automation Engineering, University of Electronic Science and Technology, Chengdu 610000, China

2. School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325000, China

3. Department of Geography and Anthropology, Louisiana State University, Baton Rouge, LA 70803, USA

4. College of Resource and Environment Engineering, Guizhou University, Guiyang 550025, China

5. LIRMM, UMR 5506, CNRS-UM, 34095 Montpellier, France

Abstract

Recently, the Vision Transformer (ViT) model has been used for various computer vision tasks, due to its advantages to extracting long-range features. To better integrate the long-range features useful for classification, the standard ViT adds a class token, in addition to patch tokens. Despite state-of-the-art results on some traditional vision tasks, the ViT model typically requires large datasets for supervised training, and thus, it still face challenges in areas where it is difficult to build large datasets, such as medical image analysis. In the ViT model, only the output corresponding to the class token is fed to a Multi-Layer Perceptron (MLP) head for classification, and the outputs corresponding to the patch tokens are exposed. In this paper, we propose an improved ViT architecture (called ViT-Patch), which adds a shared MLP head to the output of each patch token to balance the feature learning on the class and patch tokens. In addition to the primary task, which uses the output of the class token to discriminate whether the image is malignant, a secondary task is introduced, which uses the output of each patch token to determine whether the patch overlaps with the tumor area. More interestingly, due to the correlation between the primary and secondary tasks, the supervisory information added to the patch tokens help with improving the performance of the primary task on the class token. The introduction of secondary supervision information also improves the attention interaction among the class and patch tokens. And by this way, ViT reduces the demand on dataset size. The proposed ViT-Patch is validated on a publicly available dataset, and the experimental results show its effectiveness for both malignant identification and tumor localization.

Funder

Sichuan Science and Technology Support Program

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/6/3489/pdf

Reference33 articles.

1. Zheng, W., Yang, B., Xiao, Y., Tian, J., Liu, S., and Yin, L. (2022). Low-Dose CT Image Post-Processing Based on Learn-Type Sparse Transform. Sensors, 22.

2. Quantitative Evaluation of an Automated Cone-Based Breast Ultrasound Scanner for MRI–3D US Image Fusion;Nikolaev;IEEE Trans. Med. Imaging,2021

3. Sparse Angle CBCT Reconstruction Based on Guided Image Filtering;Xu;Front. Oncol.,2022

4. Brosch, T., and Tam, R. (2013, January 22–26). Manifold Learning of Brain MRIs by Deep Learning. Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Nagoya, Japan.

5. Plis, S.M., Hjelm, D.R., Salakhutdinov, R., and Calhoun, V.D. (2013). Deep learning for neuroimaging: A validation study. arXiv.

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Vision transformer promotes cancer diagnosis: A comprehensive review;Expert Systems with Applications;2024-10

2. Comparison of Vision Transformers and Convolutional Neural Networks in Medical Image Analysis: A Systematic Review;Journal of Medical Systems;2024-09-12

3. MCV-UNet: a modified convolution & transformer hybrid encoder-decoder network with multi-scale information fusion for ultrasound image semantic segmentation;PeerJ Computer Science;2024-06-24

4. Identifying HRV patterns in ECG signals as early markers of dementia;Expert Systems with Applications;2024-06

5. Identification of optimal semantic segmentation architecture for the segmentation of hepatic structures from computed tomography images;Multimedia Tools and Applications;2024-04-08