Unveiling Genomic Complexity: A Framework for Genome Graph Structural Analysis and Optimised Variant Calling Workflows-Reference-Cited by-同舟云学术

Unveiling Genomic Complexity: A Framework for Genome Graph Structural Analysis and Optimised Variant Calling Workflows

Published:2024-06-11 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Kamaraj Venkatesh^ORCID,Gupta Ayam^ORCID,Raman Karthik^ORCID,Narayanan Manikandan^ORCID,Sinha Himanshu^ORCID

Abstract

ABSTRACTGenome graphs offer a powerful alternative to linear reference genomes, as they provide a richer representation of a collection of genomes by emphasising the polymorphic regions. Despite their innate advantages, there is a lack of techniques to analyse and visualise the structural complexity of a genome graph. In our study, we formulated a novel framework to characterise the structural properties of a genome graph. Specifically, our framework helps to summarise and visualise the entire human genome graph’s structure in a single figure and identify genomic loci valuable for further research with increased individual-to-individual variability. We applied our framework to examine the structures of two human pan-genome graphs built from 2504 diverse samples in the 1000 Genomes Project: one augmenting only common variants and the other with all variants, including rare ones. As expected, we observed that the rare variants increased the variability of the genome graph by 10-fold and hypervariability by 50-fold. Our framework highlighted biologically significant regions of the human genome, like the HLA and DEFB gene loci. We then optimised genome-graph-based variant calling workflows and analysed human whole genomes with the constructed graphs to determine that genome graphs captured 9.83% more variants than the linear reference genome. Interestingly, we observed no significant differences in the variant calling performance of the two genome graphs, suggesting that rare variants had minimal impact. Through the proposed methods, we demonstrated that genome graphs can systematically reveal the underlying genomic complexity of the population or species they represent.

Publisher

Cold Spring Harbor Laboratory

Reference47 articles.

1. Initial sequencing and analysis of the human genome

2. International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931–945.

3. Homo sapiens genome assembly GRCh38 NCBI .

4. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

5. A Draft Sequence of the Neandertal Genome