Machine Learning in Identifying Marker Genes for Congenital Heart Diseases of Different Cardiac Cell Types
Author:
Ma Qinglan1, Zhang Yu-Hang2ORCID, Guo Wei3, Feng Kaiyan4, Huang Tao56ORCID, Cai Yu-Dong1ORCID
Affiliation:
1. School of Life Sciences, Shanghai University, Shanghai 200444, China 2. Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA 3. Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China 4. Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China 5. Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China 6. CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
Abstract
Congenital heart disease (CHD) represents a spectrum of inborn heart defects influenced by genetic and environmental factors. This study advances the field by analyzing gene expression profiles in 21,034 cardiac fibroblasts, 73,296 cardiomyocytes, and 35,673 endothelial cells, utilizing single-cell level analysis and machine learning techniques. Six CHD conditions: dilated cardiomyopathy (DCM), donor hearts (used as healthy controls), hypertrophic cardiomyopathy (HCM), heart failure with hypoplastic left heart syndrome (HF_HLHS), Neonatal Hypoplastic Left Heart Syndrome (Neo_HLHS), and Tetralogy of Fallot (TOF), were investigated for each cardiac cell type. Each cell sample was represented by 29,266 gene features. These features were first analyzed by six feature-ranking algorithms, resulting in several feature lists. Then, these lists were fed into incremental feature selection, containing two classification algorithms, to extract essential gene features and classification rules and build efficient classifiers. The identified essential genes can be potential CHD markers in different cardiac cell types. For instance, the LASSO identified key genes specific to various heart cell types in CHD subtypes. FOXO3 was found to be up-regulated in cardiac fibroblasts for both Dilated and hypertrophic cardiomyopathy. In cardiomyocytes, distinct genes such as TMTC1, ART3, ARHGAP24, SHROOM3, and XIST were linked to dilated cardiomyopathy, Neo-Hypoplastic Left Heart Syndrome, hypertrophic cardiomyopathy, HF-Hypoplastic Left Heart Syndrome, and Tetralogy of Fallot, respectively. Endothelial cell analysis further revealed COL25A1, NFIB, and KLF7 as significant genes for dilated cardiomyopathy, hypertrophic cardiomyopathy, and Tetralogy of Fallot. LightGBM, Catboost, MCFS, RF, and XGBoost further delineated key genes for specific CHD subtypes, demonstrating the efficacy of machine learning in identifying CHD-specific genes. Additionally, this study developed quantitative rules for representing the gene expression patterns related to CHDs. This research underscores the potential of machine learning in unraveling the molecular complexities of CHD and establishes a foundation for future mechanism-based studies.
Funder
Strategic Priority Research Program of Chinese Academy of Sciences National Key R&D Program of China Fund of the Key Laboratory of Tissue Microenvironment and Tumor of Chinese Academy of Sciences Shandong Provincial Natural Science Foundation
Reference117 articles.
1. Congenital heart disease: Causes, diagnosis, symptoms, and treatments;Sun;Cell Biochem. Biophys.,2015 2. The changing epidemiology of congenital heart disease;Zomer;Nat. Rev. Cardiol.,2011 3. Arrhythmia diagnosis and management throughout life in congenital heart disease;Clark;Expert Rev. Cardiovasc. Ther.,2016 4. GBD 2017 Congenital Heart Disease Collaborators (2020). Global, regional, and national burden of congenital heart disease, 1990–2017: A systematic analysis for the global burden of disease study 2017. Lancet Child Adolesc. Health, 4, 185–200. 5. Congenital heart defects in the united states: Estimating the magnitude of the affected population in 2010;Gilboa;Circulation,2016
|
|