CTDUNet: A Multimodal CNN–Transformer Dual U-Shaped Network with Coordinate Space Attention for Camellia oleifera Pests and Diseases Segmentation in Complex Environments
Author:
Guo Ruitian1, Zhang Ruopeng1ORCID, Zhou Hao1, Xie Tunjun1, Peng Yuting1, Chen Xili1, Yu Guo2, Wan Fangying1, Li Lin1, Zhang Yongzhong1, Liu Ruifeng3
Affiliation:
1. School of Electronic Information and Physics, Central South University of Forestry and Technology, Changsha 410004, China 2. School of Business, Central South University of Forestry and Technology, Changsha 410004, China 3. School of Forestry, Central South University of Forestry and Technology, Changsha 410004, China
Abstract
Camellia oleifera is a crop of high economic value, yet it is particularly susceptible to various diseases and pests that significantly reduce its yield and quality. Consequently, the precise segmentation and classification of diseased Camellia leaves are vital for managing pests and diseases effectively. Deep learning exhibits significant advantages in the segmentation of plant diseases and pests, particularly in complex image processing and automated feature extraction. However, when employing single-modal models to segment Camellia oleifera diseases, three critical challenges arise: (A) lesions may closely resemble the colors of the complex background; (B) small sections of diseased leaves overlap; (C) the presence of multiple diseases on a single leaf. These factors considerably hinder segmentation accuracy. A novel multimodal model, CNN–Transformer Dual U-shaped Network (CTDUNet), based on a CNN–Transformer architecture, has been proposed to integrate image and text information. This model first utilizes text data to address the shortcomings of single-modal image features, enhancing its ability to distinguish lesions from environmental characteristics, even under conditions where they closely resemble one another. Additionally, we introduce Coordinate Space Attention (CSA), which focuses on the positional relationships between targets, thereby improving the segmentation of overlapping leaf edges. Furthermore, cross-attention (CA) is employed to align image and text features effectively, preserving local information and enhancing the perception and differentiation of various diseases. The CTDUNet model was evaluated on a self-made multimodal dataset compared against several models, including DeeplabV3+, UNet, PSPNet, Segformer, HrNet, and Language meets Vision Transformer (LViT). The experimental results demonstrate that CTDUNet achieved an mean Intersection over Union (mIoU) of 86.14%, surpassing both multimodal models and the best single-modal model by 3.91% and 5.84%, respectively. Additionally, CTDUNet exhibits high balance in the multi-class segmentation of Camellia oleifera diseases and pests. These results indicate the successful application of fused image and text multimodal information in the segmentation of Camellia disease, achieving outstanding performance.
Funder
National Natural Science Foundation in China Education Department Key Program of Hunan Province
Reference53 articles.
1. Valorization of Camellia oleifera Oil Processing Byproducts to Value-Added Chemicals and Biobased Materials: A Critical Review;Liu;Green Energy Environ.,2024 2. Yang, Z., Wang, Y., Wu, X., Quan, W., Chen, Q., and Wang, A. (2024). Efficient Preparation of Biodiesel Using Sulfonated Camellia Oleifera Shell Biochar as a Catalyst. Molecules, 29. 3. Wu, W.-J., Zou, Y.-N., Xiao, Z.-Y., Wang, F.-L., Hashem, A., Abd_Allah, E.F., and Wu, Q.-S. (2024). Changes in Fatty Acid Profiles in Seeds of Camellia oleifera Treated by Mycorrhizal Fungi and Glomalin. Horticulturae, 10. 4. Identification of Varieties in Camellia oleifera Leaf Based on Deep Learning Technology;Dong;Ind. Crops Prod.,2024 5. Khan, H., Haq, I.U., Munsif, M., Khan, S.U., and Lee, M.Y. (2022). Automated Wheat Diseases Classification Framework Using Advanced Machine Learning Technique. Agriculture, 12.
|
|