Abstract
AbstractDespite great progress in the identification of neurodevelopmental disorder (NDD) risk genes, there are thousands that remain to be discovered. Computational tools that provide accurate gene-level predictions of NDD risk can significantly reduce the costs and time needed to prioritize and discover novel NDD risk genes. Here, we first demonstrate that machine learning models trained solely on single-cell RNA-sequencing data from the developing human cortex can robustly predict genes implicated in autism spectrum disorder (ASD), developmental and epileptic encephalopathy (DEE), and developmental delay (DD). Strikingly, we find differences in gene expression patterns of genes with monoallelic and biallelic inheritance patterns. We then integrate these expression data with 300 orthogonal features in a semi-supervised machine learning framework (mantis-ml) to train inheritance-specific models for ASD, DEE, and DD. The models have high predictive power (AUCs: 0.84 to 0.95) and top-ranked genes were up to two-fold (monoallelic models) and six-fold (biallelic models) more enriched for high-confidence NDD risk genes than genic intolerance metrics. Across all models, genes in the top decile of predicted risk genes were 60 to 130 times more likely to have publications strongly linking them to the phenotype of interest in PubMed compared to the bottom decile. Collectively, this work provides highly robust novel NDD risk gene predictions that can complement large-scale gene discovery efforts and underscores the importance of incorporating inheritance into gene risk prediction tools (https://nddgenes.com).
Publisher
Cold Spring Harbor Laboratory
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献