Affiliation:
1. West China Hospital of Medicine: West China Hospital of Sichuan University
2. York Hospital
3. Sixth People's Hospital of Chengdu
4. West China School of Medicine: West China Hospital of Sichuan University
5. West China Hospital of Sichuan University
Abstract
Abstract
Background
Lung cancer is the leading cause of malignancy-associated mortality worldwide. Early-stage lung cancer often manifests without typical symptoms, frequently leading to late-stage diagnoses and grim prognoses. Therefore, the timely and precise identification of lung cancer in high-risk individuals is particularly significant. However, the development of machine learning-based models using peripheral blood-derived transcriptomic markers for early lung cancer detection remains unexplored.
Methods
Using a training cohort (GSE135304), we combined multiple machine learning algorithms to formulate the Lung Cancer Diagnostic Score (LCDS), utiliazing transcriptomic features within peripheral blood samples. To evaluate the LCDS model’s accuracy, we employed the area under the receiver operating characteristic (ROC) curve (AUC) in validation cohorts (GSE42834, GSE157086, and in-house dataset). Immune infiltration and pathway enrichment analyses were conducted to explore potential associations between the LCDS and lung cancer pathogenesis.
Results
Initial screening, based on univariable logistic regression in conjunction with ROC analysis, identified 844 genes. Subsequently, 87 genes, selected via Boruta features, were incorporated into 97 machine learning algorithms to construct the LCDS model. The highest accuracy was achieved using the random forest (RF) algorithm, incorporating expression of 87 genes, with a mean AUC value of 0.938. A lower LCDS was significantly associated with elevated immune scores, increased CD4 + T cells and CD8 + T cells. Furthermore, individuals within the higher LCDS group exhibited pronounced activation of hypoxia, PPAR, and Toll-like receptors (TLRs) signaling pathways, reduced DNA damage repair pathway scores.
Conclusions
An LCDS based on machine learning targeting transcriptomic features in peripheral blood was highly accurate in distinguishing lung cancer patients from healthy individuals. Additionally, individuals within the high LCDS group exhibited diminished antitumor immunity and augmented signaling pathway activity driving tumorigenesis and progression. The results of this study might facilitate the early lung cancer prediction and further promote precision treatment for lung cancer patients.
Publisher
Research Square Platform LLC