A genome-wide mutational constraint map quantified from variation in 76,156 human genomes
Author:
Chen Siwei, Francioli Laurent C., Goodrich Julia K., Collins Ryan L., Kanai MasahiroORCID, Wang Qingbo, Alföldi Jessica, Watts Nicholas A., Vittal Christopher, Gauthier Laura D., Poterba Timothy, Wilson Michael W., Tarasova Yekaterina, Phu William, Yohannes Mary T., Koenig Zan, Farjoun Yossi, Banks Eric, Donnelly Stacey, Gabriel Stacey, Gupta Namrata, Ferriera Steven, Tolonen Charlotte, Novod Sam, Bergelson Louis, Roazen David, Ruano-Rubio Valentin, Covarrubias Miguel, Llanwarne Christopher, Petrillo Nikelle, Wade Gordon, Jeandet Thibault, Munshi Ruchi, Tibbetts Kathleen, O’Donnell-Luria AnneORCID, Solomonson Matthew, Seed Cotton, Martin Alicia R.ORCID, Talkowski Michael E., Rehm Heidi L., Daly Mark J.ORCID, Tiao Grace, Neale Benjamin M.ORCID, MacArthur Daniel G., Karczewski Konrad J.ORCID,
Abstract
AbstractThe depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders, but attempts to assess constraint for non-protein-coding regions have proven more difficult. Here we aggregate, process, and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD), the largest public open-access human genome reference dataset, and use this dataset to build a mutational constraint map for the whole genome. We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation across the genome. As expected, proteincoding sequences overall are under stronger constraint than non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association, and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, while non-coding constraint captures additional functional information underrecognized by gene constraint metrics. We demonstrate that this genome-wide constraint map provides an effective approach for characterizing the non-coding genome and improving the identification and interpretation of functional human genetic variation.
Publisher
Cold Spring Harbor Laboratory
Cited by
119 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|