Author:
Collins Ryan L.,Brand Harrison,Karczewski Konrad J.,Zhao Xuefang,Alföldi Jessica,Francioli Laurent C.,Khera Amit V.,Lowther Chelsea,Gauthier Laura D.,Wang Harold,Watts Nicholas A.,Solomonson Matthew,O’Donnell-Luria Anne,Baumann Alexander,Munshi Ruchi,Walker Mark,Whelan Christopher,Huang Yongqing,Brookings Ted,Sharpe Ted,Stone Matthew R.,Valkanas Elise,Fu Jack,Tiao Grace,Laricchia Kristen M.,Ruano-Rubio Valentin,Stevens Christine,Gupta Namrata,Margolin Lauren,Taylor Kent D.,Lin Henry J.,Rich Stephen S.,Post Wendy,Chen Yii-Der Ida,Rotter Jerome I.,Nusbaum Chad,Philippakis Anthony,Lander Eric,Gabriel Stacey,Neale Benjamin M.,Kathiresan Sekar,Daly Mark J.,Banks Eric,MacArthur Daniel G.,Talkowski Michael E., ,
Abstract
SUMMARYStructural variants (SVs) rearrange large segments of the genome and can have profound consequences for evolution and human diseases. As national biobanks, disease association studies, and clinical genetic testing grow increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD) have become integral for interpreting genetic variation. To date, no large-scale reference maps of SVs exist from high-coverage sequencing comparable to those available for point mutations in protein-coding genes. Here, we constructed a reference atlas of SVs across 14,891 genomes from diverse global populations (54% non-European) as a component of gnomAD. We discovered a rich landscape of 433,371 distinct SVs, including 5,295 multi-breakpoint complex SVs across 11 mutational subclasses, and examples of localized chromosome shattering, as in chromothripsis. The average individual harbored 7,439 SVs, which accounted for 25-29% of all rare protein-truncating events per genome. We found strong correlations between constraint against damaging point mutations and rare SVs that both disrupt and duplicate protein-coding sequence, suggesting intolerance to reciprocal dosage alterations for a subset of tightly regulated genes. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than any effect on noncoding SVs. Finally, we benchmarked carrier rates for medically relevant SVs, finding very large (≥1Mb) rare SVs in 3.8% of genomes (~1:26 individuals) and clinically reportable incidental SVs in 0.18% of genomes (~1:556 individuals). These data have been integrated directly into the gnomAD browser (https://gnomad.broadinstitute.org) and will have broad utility for population genetics, disease association, and diagnostic screening.
Publisher
Cold Spring Harbor Laboratory