Author:
Depope Al,Mondelli Marco,Robinson Matthew R.
Abstract
AbstractEfficient utilization of large-scale biobank data is crucial for inferring the genetic basis of disease and predicting health outcomes from the DNA. Yet we lack accurate methods that scale to data where electronic health records are linked to whole genome sequence information. To address this issue, we develop a new algorithmic paradigm based on Approximate Message Passing (AMP), which is specifically tailored for genomic prediction and association testing. Our method yields comparable out-of-sample prediction accuracy to the state of the art on UK Biobank traits, whilst dramatically improving computational complexity, with a 10x-speed up in the run time. In addition, AMP theory provides a joint association testing framework, which outperforms the commonly used REGENIE method, in a third of the compute time. This first, truly large-scale application of the AMP framework lays the foundations for a far wider range of statistical analyses for hundreds of millions of variables measured on millions of people.
Publisher
Cold Spring Harbor Laboratory