Affiliation:
1. School of Basic Medical Sciences, Shanxi Medical University, Taiyuan, Shanxi Province, 030001, P.R. China
2. Department of Computer Engineering, Taiyuan Institute of Technology, Taiyuan, Shanxi Province, 030008, P.R. China
Abstract
Background:
Stomach cancer, also known as gastric adenocarcinoma, remains the most
common and deadly cancer worldwide. Its early diagnosis and prevention are effective to improve
the 5-year survival rate of the patients. Therefore, it is important to discover specific biomarkers for
early diagnosis and drug treatment. This study investigates the potential key genes and signaling
pathways involved in gastric cancer.
Methods:
The gene expression profiles, GSE63089, GSE33335, and GSE79973, were retrieved for
the identification of Differentially Expressed Genes (DEGs) within a total of 80 gastric cancer samples
and 80 normal samples. A total of 1423 uP- and 1155 downregulated genes were screened for
overlapping DEGs visualized via Venn diagrams along with 58 upregulated and 43 downregulated
genes. These overlapping DEGs were evaluated with Gene Ontology (GO) enrichment, Kyoto Encyclopedia
of Genes and Genomes (KEGG) enrichment, and Protein-Protein Interaction (PPI) network
analysis. Using DAVID software, we identified several genes enriched in both GO and KEGG
analyses. PPI analysis was performed with STRING software, and 3 submodules were obtained with
Cytoscape software. Then, we used Cytohubba with 12 classification methods to select candidate
hub genes. The group 1 genes enriched in GO and KEGG pathway intersected with group 2 genes,
which were approved by nine algorithms, and group 3 genes clustered in three submodules. 9 hub
genes were intersected from group 1/2/3 genes and the prognostic values were estimated through
GEPIA. We found that the LUM and COL1A1 expression levels and survival outcomes displayed a
favorable prognostic value (P-value = 0.013 for LUM and P-value =0.042 for COL1A1).
Results:
Finally, 5 machine learning methods were employed for the validation of two hub genes
(COL1A1, LUM) to distinguish between the cancer samples and non-cancer samples. The accuracy
of XGBoost was estimated to be 0.9375, and the precision and specificity as 1.000. The highest
recalls of LR and MLP were 1.0000, and the AUC was 1.0000. In the test set GSE65801, the accuracy
of all models was greater than 80%, and the XGBoost model obtained the highest prediction
accuracy of 0.8906. The precision of 0.9301 and the specificity of 0.9375 were obtained. The highest
recall of MLP was 0.8750 and AUC was 0.9082. The correlation of prognostic indicators with
the tumor-infiltrating immune cell levels was analyzed using TIMER.
Conclusion:
The identified hub genes explored in this study would enhance the understanding of
the molecular mechanism of gastric cancer and may be regarded as a potential therapeutic target as
assessed by integrating bioinformatics and machine learning methods.
Funder
Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi
Taiyuan Institute of Technology Youth Academic Leader Support Program, and Excellent Youth Foundation of Shanxi Scientific Committee
Publisher
Bentham Science Publishers Ltd.
Subject
Organic Chemistry,Computer Science Applications,Drug Discovery,General Medicine