Abstract
MotivationCopy number variants (CNVs) are large deletions or duplications at least 50 to 200 base pairs long. They play an important role in multiple disorders, but accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process.ResultsWe use a regression tree-based approach to call CNVs from whole-genome sequencing (WGS, > 18x) variant call-sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. 61.8% of detected events have been previously reported in the Database of Genomic Variants. 23% of high-quality deletions affect entire genes, and we recapitulate known events such as theGSTM1andRHDgene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe the LD structure and copy number variation underlying the association between levels of the CCL3 protein and a complex structural variant (MAF = 0.15, p = 3.6×10-12) affectingCCL3L3, a paralog of theCCL3gene. We also identify acis-association between a low-frequencyNOMO1deletion and the protein product of this gene (MAF = 0.02, p = 2.2×10-7), for which nocis-ortrans-single nucleotide variant-driven protein quantitative trait locus (pQTL) has been documented to date. This work demonstrates that existing population-wide WGS call-sets can be mined for CNVs with minimal computational overhead, delivering insight into a less well-studied, yet potentially impactful class of genetic variant.AvailabilityThe regression tree based approach, UN-CNVc, is available as an R and bash executable on GitHub athttps://github.com/agilly/un-cnvc.Contacteleftheria.zeggini@helmholtz-muenchen.de;arthur.gilly@helmholtz-muenchen.deSupplementary InformationSupplementary information is appended.
Publisher
Cold Spring Harbor Laboratory