Abstract
AbstractDeep phenotyping can enhance the power of genetic analysis such as genome-wide association study (GWAS), but recurrence of missing phenotypes compromises the potentials of such resources. Although many phenotypic imputation methods have been developed, accurate imputation for millions of individuals still remains extremely challenging. In the present study, leveraging efficient machine learning (ML)-based algorithms, we developed a novel multi-phenotype imputation method based on mixed fast random forest (PIXANT), which is several orders of magnitude in runtime and computer memory usage than the state-of-the-art methods when applied to the UK Biobank (UKB) data and scalable to cohorts with millions of individuals. Our simulations with hundreds of individuals showed that PIXANT was superior to or comparable to the most advanced methods available in terms of accuracy. We also applied PIXANT to impute 425 phenotypes for the UKB data of 277,301 unrelated white British citizens and performed GWAS on imputed phenotypes, and identified a 15.6% more GWAS loci than before imputation (8,710vs7,355). Due to the increased statistical power of GWAS, a certain proportion of novel genes were rediscovered, such asRNF220,SCN10AandRGS6that affect heart rate, demonstrating the use of imputed phenotype data in a large cohort to discover novel genes for complex traits.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献