Author:
Xu Yong,Wang Yao,Liang Leilei,Song Nan
Abstract
Background: Single-cell RNA sequencing is necessary to understand tumor heterogeneity, and the cell type heterogeneity of lung adenocarcinoma (LUAD) has not been fully studied.Method: We first reduced the dimensionality of the GSE149655 single-cell data. Then, we statistically analysed the subpopulations obtained by cell annotation to find the subpopulations highly enriched in tumor tissues. Monocle was used to predict the development trajectory of five subpopulations; beam was used to find the regulatory genes of five branches; qval was used to screen the key genes; and cellchart was used to analyse cell communication. Next, we used the differentially expressed genes of TCGA-LUAD to screen for overlapping genes and established a prognostic risk model through univariate and multivariate analyses. To identify the independence of the model in clinical application, univariate and multivariate Cox regression were used to analyse the relevant HR, 95% CI of HR and p value. Finally, the novel biomarker genes were verified by qPCR and immunohistochemistry.Results: The single-cell dataset GSE149655 was subjected to quality control, filtration and dimensionality reduction. Finally, 23 subpopulations were screened, and 11-cell subgroups were annotated in 23 subpopulations. Through the statistical analysis of 11 subgroups, five important subgroups were selected, including lung epithelial cells, macrophages, neuroendocrine cells, secret cells and T cells. From the analysis of cell trajectory and cell communication, it is found that the interaction of five subpopulations is very complex and that the communication between them is dense. We believe that these five subpopulations play a very important role in the occurrence and development of LUAD. Downloading the TCGA data, we screened the marker genes of these five subpopulations, which are also the differentially expressed genes in tumorigenesis, with a total of 462 genes, and constructed 10 gene prognostic risk models based on related genes. The 10-gene signature has strong robustness and can achieve stable prediction efficiency in datasets from different platforms. Two new molecular markers related to LUAD, HLA-DRB5 and CCDC50, were verified by qPCR and immunohistochemistry. The results showed that HLA-DRB5 expression was negatively correlated with the risk of LUAD, and CCDC50 expression was positively correlated with the risk of LUAD.Conclusion: Therefore, we identified a prognostic risk model including CCL20, CP, HLA-DRB5, RHOV, CYP4B1, BASP1, ACSL4, GNG7, CCDC50 and SPATS2 as risk biomarkers and verified their predictive value for the prognosis of LUAD, which could serve as a new therapeutic target.
Funder
Natural Science Foundation of Shanghai
Subject
Genetics (clinical),Genetics,Molecular Medicine