Abstract
AbstractMotivationAdvances in sequencing technologies have enabled the early detection of genetic diseases and the development of personalized medicine. However, the variance explained by genetic variations is typically small compared to the heritability estimates. Consequently, there is a pressing need to develop enhanced polygenic risk score (PRS) prediction models. We seek an approach that transcends the limitations of the routinely used additive model for PRS.ResultsHere we present DROP-DEEP, a novel method for calculating PRS that enhances the explanation of the heritability variance of complex traits by incorporating high-dimensional genetic interactions. The first stage of DROP-DEEP employs an unsupervised approach to reduce dimensionality, while the second stage involves training a prediction model using a supervised machine-learning algorithm. Notably, the first stage of training is phenotype-agnostic. Thus, while it is computationally intensive, it is performed only once. Its output can serve as input for predicting any chosen trait or disease. We evaluated the efficacy of the DROP-DEEP dimensionality reduction models using principal component analysis (PCA) and deep neural networks (DNN). All models were trained using the UK Biobank (UKB) dataset with over 340,000 subjects and a set of approximately 460,000 single nucleotide variants (SNVs) across the genome. The results of DROP-DEEP, which was established for patients diagnosed with hypertension, outperformed other approaches. We extended the analysis to include an additional five binary and continuous phenotypes, each repeated five times for reproducibility assessment. For each phenotype, DROP-DEEP results were compared to commonly used PRS methodologies, and the performance of all models was discussed.ConclusionOur approach overcomes the need for variable selection while maintaining computational feasibility. We conclude that the DROP-DEEP approach exhibits significant advantages compared to commonly used PRS methods and can be used efficiently for hundreds of genetic traits.Availability and ImplementationAll the codes and the trained dimensionality reduction models are available at:https://github.com/HadasaK1/DROP-DEEP.
Publisher
Cold Spring Harbor Laboratory