Abstract
Microsatellite instability (MSI) arises from defective DNA mismatch repair (MMR) systems and is prevalent in various cancer types. MSI is classified as MSI-High (MSI-H), MSI-Low (MSI-L), or Microsatellite Stable (MSS), with the latter two occasionally combined into a single designation called MSI-L/MSS. Identifying the MSI status (i.e., MSI-H vs. MSI-L/MSS) in colorectal cancer (CRC) is critical for guiding immunotherapy and assessing prognosis. Conventional molecular tests for MSI are expensive, time-consuming, and limited by experimental conditions. Advancements in MSI detection have been made using deep learning methods with histopathological images, yet efforts to improve MSI detection's predictive accuracy by integrating histopathological images and clinical data remain limited. This study initially analyzed clinical information variation between the MSI-H and MSI-L/MSS groups, discovering significant differences in cancer stages N and M. Subsequently, texture features were extracted using the Gray-level co-occurrence matrix (GLCM) from both groups, disclosing noteworthy disparities in mean feature information. Finally, a multimodal compact bilinear pool (MCB) was employed to merge histopathological images with clinical data. By applying this analysis framework to the cancer genome atlas (TCGA) CRC data, a prediction area under the curve (AUC) of 0.833 was achieved through 5-fold cross-validation in predicting MSI status. The results demonstrated higher accuracy in determining MSI compared to existing unimodal MSI prediction methods and other contemporary techniques. Additionally, significant regions in whole-slide images (WSI) for determining MSI labels were visualized. To summarize, this study presents an accurate multimodal deep learning model for predicting microsatellite instability in colorectal cancer by integrating histopathological images and clinical data, together with a method to visualize important regions in WSI to determine MSI status.