Abstract
Background: Head and neck cancer predominantly originates from the mucosal layer of the upper aerodigestive tract, with squamous cell carcinoma representing the majority of cases. Therefore, a comprehensive oral and upper aerodigestive tract endoscopy examination serves as the primary diagnostic method for these cancers. While deep learning, particularly in computer vision, has been extensively researched for lesion segmentation in various diagnostic endoscopies such as colon polyps and gastric lesions, there have been limited reports on deep learning algorithms specifically tailored for segmenting head and neck squamous cell carcinoma.
Methods: This study comprises a case series investigating artificial intelligence algorithms, focusing on head and neck squamous cell carcinoma (HNSCC) endoscopic images captured between 2016 and 2020. The images were sourced from the Department of Otolaryngology-Head and Neck Surgery at Kaohsiung Veterans General Hospital, a tertiary medical center in southern Taiwan. All photos were rigid endoscopy documentation of tumors histologically confirmed as SCC through biopsy or surgical excision. Importantly, these tumors were captured at the initial presentation of the disease, prior to any surgical or chemo-radiotherapy intervention.
We introduce a novel modification of the Neural Architecture Search (NAS) - U-Net-based model, termed SCC-Net, tailored for segmenting the enrolled endoscopic photos. This modification incorporates a new technique termed "Learnable Discrete Wavelet Pooling," which devises a new formulation by combining outputs from different layers using a channel attention module, assigning weights based on their importance in information flow. Additionally, we integrated the cross-stage-partial design from CSPnet. To evaluate performance, we compared SCC-Net with eight other state-of-the-art image segmentation models.
Results: We collected a total of 556 pathologically confirmed SCC photos of oral cavity, oropharynx, hypopharynx and glottis. The new SCC-Net algorithm achieves high mean Intersection over Union (mIOU) of 87.2%, accuracy of 97.17%, and recall of 97.15%. When comparing the performance of our proposed model with 8 different state-of-the-art image segmentation artificial neural network models, our model performed best in mIOU, DSC, accuracy and recall.
Conclusions: Our proposed SCC-Net architecture successfully segmented lesions from white light endoscopic images with promising accuracy, demonstrating consistent performance across all upper aerodigestive tract areas.