Abstract
AbstractIntroductionLung cancer (LC) is the second most frequent cancer worldwide with high incidences and mortality rates and non-small-cell-lung cancer (NSCLC) accounts for 80-85% cases of LC. It is further majorly sub-classified into adenocarcinoma (AC), and squamous-cell carcinoma (SCC). A late diagnosis at an advanced stage, a high rate of metastasis, and the development of therapy resistance are responsible for approximately 95% of mortality. Owing to high heterogeneity and variances in subtypes, it is important to precisely classify them for treatment. However, it poses a challenge in clinical practices, as it requires accurate quantification of each proportion subtype which is time-consuming and sometimes erroneous. The lower airways are home to a dynamic bacterial population sustained by the immigration, elimination, and migration of microbes from the gastrointestinal tract and upper airway tracts. The disruption in the homeostasis of microbiome compositions was found to be correlated with the increased risk of LC. Artificial intelligence (AI) techniques are used extensively in the early screening, and treatment of NSCLC have made significant strides in recent years. Recently, the use of CT/MRI scan image data in prediction models also results in false-positive rates and requires subsequent tests for further exploration which delays the prognostication. Therefore, early diagnosis, prevention, and treatment are critical to enhance survival and reduce death. Here, we aim for the classification of AC and SCC using the lung microbiome of lung tissue samples, implementing AI-based algorithms.MethodsWe have obtained raw sequencing data from the NCBI online database, and 149 AC samples and 145 SCC samples in patients were analyzed for their lung microbiome present in the lung tissue samples. The metadata such as patient age, sex, smoking history, and environmental material (malignant or not) were also analyzed. Using these data, machine learning algorithms were applied to select the best microbiome features for the classification of subtypes.ResultsA supervised ML and DL based model was developed that can discriminate NSCLC subtypes based on their microbial information, exploring the microbiome as predictive information for early screening. Consequently, 17 features were identified as a biomarker, and they showed good performance in distinguishing AC from SCC with an accuracy of 81% in KNN and 71% in DNN when demonstrated on the validation dataset.ConclusionThis study proposed a supervised machine learning framework where we can rely on taxonomic features and AI techniques to classify overlapped AC and SCC metagenomic data providing lung microbiome as a predictive and diagnostic biomarker in LC. Moreover, our framework will also be very helpful to other researchers to obtain further biomarkers and perform analysis in overlapped subtypes in different diseases.
Publisher
Cold Spring Harbor Laboratory
Reference50 articles.
1. B. S. Chhikara and K. Parang , “Chemical Biology LETTERS Global Cancer Statistics 2022: the trends projection analysis.” [Online]. Available: https://pubs.thesciencein.org/cbl
2. On the Origin of Cancer Metastasis
3. “‘Lung cancer - non-small cell - statistics,’ Cancer.Net, https://www.cancer.net/cancer-types/lung-cancer-non-small-cell/statistics#:~:text=The%205%2Dyear%20relative%20survival%20rate%20for%20NSCLC%20in%20women,rate%20for%20men%20is%2023%25 (accessed Oct. 26, 2023).”.
4. Challenges of Immunotherapy in Stage IV Non–Small-Cell Lung Cancer