Abstract
AbstractRare variants can explain part of the heritability of complex traits that are ignored by conventional GWASs. The emergence of large-scale population sequencing data provides opportunities to study rare variants. However, few studies systematically evaluate the extent to which imputation using sequencing data can improve the power of rare variant association studies. Using whole genome sequencing (WGS) data (n = 150,119) as the ground truth, we described the landscape and evaluated the consistency of rare variants in SNP array (n = 488,377) imputed from TOPMed or HRC+UK10K in the UK Biobank, respectively. The TOPMed imputation covered more rare variants, and its imputation quality could reach 0.5 for even extremely rare variants. TOPMed-imputed data was closer to WGS in all MAC intervals for three ethnicities (average Cramer’s V>0.75). Furthermore, association tests were performed on 30 quantitative and 15 binary traits. Compared to WGS data, the identified rare variants in TOPMed-imputed data increased 27.71% for quantitative traits, while it could be improved by ∼10-fold for binary traits. In gene-based analysis, the signals in TOPMed-imputed data increased 111.45% for quantitative traits, and it identified 15 genes in total, while WGS only found 6 genes for binary traits. Finally, we harmonized SNP array and WGS data for lung cancer and epithelial ovarian cancer. More variants and genes could be identified than from WGS data alone, such asBRCA1,BRCA2, andCHRNA5. Our findings highlighted that incorporating rare variants imputed from large-scale sequencing populations could greatly boost the power of GWAS.
Publisher
Cold Spring Harbor Laboratory