Abstract
AbstractHuman pangenomes contain assemblies of non-reference copy-number variable (CNV) genes. We developed a new method, ctyper, to identify the copy-number of specific alleles of CNV genes cataloged in pangenomes with NGS datasets. Applying ctyper to the 1000-genomes samples revealed population stratification of paralogs and two classes of CNVs: recent CNVs due to ongoing duplications, and polymorphic CNVs from non-reference ancient paralogs. Expression quantitative trait locus analysis determined allele-specific expression within gene families, revealing that 7.94% of paralogs and 3.28% orthologs had significantly divergent expression. Case studies of individual genes include finding lower expression onSMN-1 copies that arose from conversion fromSMN-2, and increased expression on a form ofAMY2Bthat has undergone a translocation. Moreover, 4.7% of paralogs and 1.2% of orthologs had different most-expressed tissues. Furthermore, the genotypes explain more expression variance than known eQTL variants. Overall, ctyper enables biobank-scale genotyping of sequence-resolved CNVs.
Publisher
Cold Spring Harbor Laboratory