Abstract
BackgroundDiagnosing mediastinal tumours, including incidental lesions, using low-dose CT (LDCT) performed for lung cancer screening, is challenging. It often requires additional invasive and costly tests for proper characterisation and surgical planning. This indicates the need for a more efficient and patient-centred approach, suggesting a gap in the existing diagnostic methods and the potential for artificial intelligence technologies to address this gap. This study aimed to create a multimodal hybrid transformer model using the Vision Transformer that leverages LDCT features and clinical data to improve surgical decision-making for patients with incidentally detected mediastinal tumours.MethodsThis retrospective study analysed patients with mediastinal tumours between 2010 and 2021. Patients eligible for surgery (n=30) were considered ‘positive,’ whereas those without tumour enlargement (n=32) were considered ‘negative.’ We developed a hybrid model combining a convolutional neural network with a transformer to integrate imaging and clinical data. The dataset was split in a 5:3:2 ratio for training, validation and testing. The model’s efficacy was evaluated using a receiver operating characteristic (ROC) analysis across 25 iterations of random assignments and compared against conventional radiomics models and models excluding clinical data.ResultsThe multimodal hybrid model demonstrated a mean area under the curve (AUC) of 0.90, significantly outperforming the non-clinical data model (AUC=0.86, p=0.04) and radiomics models (random forest AUC=0.81, p=0.008; logistic regression AUC=0.77, p=0.004).ConclusionIntegrating clinical and LDCT data using a hybrid transformer model can improve surgical decision-making for mediastinal tumours, showing superiority over models lacking clinical data integration.
Funder
Bayer Yakuhin
St. Luke’s Health Science Research