Affiliation:
1. Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Shanghai China
2. Institute of Medical Robotics Shanghai Jiao Tong University Shanghai China
3. Research, Technology and Clinical Medtronic Technology Center Shanghai China
4. Visualization and Robotics Medtronic Technology Center Shanghai China
Abstract
AbstractBackgroundSpinal diseases are burdening an increasing number of patients. And fully automatic vertebrae segmentation for CT images with arbitrary field of views (FOVs), has been a fundamental research for computer‐assisted spinal disease diagnosis and surgical intervention. Therefore, researchers aim to solve this challenging task in the past years.PurposeThis task suffers from challenges including the intra‐vertebrae inconsistency of segmentation and the poor identification of biterminal vertebrae in CT scans. And there are some limitations in existing models, which might be difficult to be applied to spinal cases with arbitrary FOVs or employ multi‐stage networks with too much computational cost. In this paper, we propose a single‐staged model called VerteFormer which can effectively deal with the challenges and limitations mentioned above.MethodsThe proposed VerteFormer utilizes the advantage of Vision Transformer (ViT), which does well in mining global relations for input data. The Transformer and UNet‐based structure effectively fuse global and local features of vertebrae. Beisdes, we propose the Edge Detection (ED) block based on convolution and self‐attention to divide neighboring vertebrae with clear boundary lines. And it simultaneously promotes the network to achieve more consistent segmentation masks of vertebrae. To better identify the labels of vertebrae in the spine, particularly biterminal vertebrae, we further introduce global information generated from the Global Information Extraction (GIE) block.ResultsWe evaluate the proposed model on two public datasets: MICCAI Challenge VerSe 2019 and 2020. And VerteFormer achieve 86.39% and 86.54% of dice scores on the public and hidden test datasets of VerSe 2019, 84.53% and 86.86% of dice scores on VerSe 2020, which outperforms other Transformer‐based models and single‐staged methods specifically designed for the VerSe Challenge. Additional ablation experiments validate the effectiveness of ViT block, ED block and GIE block.ConclusionsWe propose a single‐staged Transformer‐based model for the task of fully automatic vertebrae segmentation from CT images with arbitrary FOVs. ViT demonstrates its effectiveness in modeling long‐term relations. The ED block and GIE block has shown their improvements to the segmentation performance of vertebrae. The proposed model can assist physicians for spinal diseases' diagnosis and surgical intervention, and is also promising to be generalized and transferred to other applications of medical imaging.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献