Affiliation:
1. Institute of Agricultural Information Jiangsu Academy of Agricultural Sciences Nanjing China
2. Department of Agricultural Machinery, College of Engineering Nanjing Agricultural University Nanjing China
3. Lincoln Agri‐Robotics Centre, Lincoln Institute for Agri‐food Technology University of Lincoln Lincoln UK
4. Lincoln Centre for Autonomous Systems (L‐CAS) University of Lincoln Lincoln UK
Abstract
AbstractThe current mainstream approaches for plant organ counting are based on convolutional neural networks (CNNs), which have a solid local feature extraction capability. However, CNNs inherently have difficulties for robust global feature extraction due to limited receptive fields. Visual transformer (ViT) provides a new opportunity to complement CNNs' capability, and it can easily model global context. In this context, we propose a deep learning network based on a convolution‐free ViT backbone (tea chrysanthemum‐visual transformer [TC‐ViT]) to achieve the accurate and real‐time counting of TCs at their early flowering stage under unstructured environments. First, all cropped fixed‐size original image patches are linearly projected into a one‐dimensional vector sequence and fed into a progressive multiscale ViT backbone to capture multiple scaled feature sequences. Subsequently, the obtained feature sequences are reshaped into two‐dimensional image features and using a multiscale perceptual field module as a regression head to detect the overall scale and density variance. The resulting model was tested on 400 field images in the collected TC test data set, showing that the proposed TC‐ViT achieved the mean absolute error and mean square error of 12.32 and 15.06, with the inference speed of 27.36 FPS (512 × 512 image size) under the NVIDIA Tesla V100 GPU environment. It is also shown that light variation had the greatest effect on TC counting, whereas blurring had the least effect. This proposed method enables accurate counting for high‐density and occlusion objects in field environments and this perception system could be deployed in a robotic platform for selective harvesting and flower phenotyping.