Graph construction method impacts variation representation and analyses in a bovine super-pangenome-Reference-Cited by-同舟云学术

Graph construction method impacts variation representation and analyses in a bovine super-pangenome

Published:2023-05-22 Issue:1 Volume:24 Page:
ISSN:1474-760X
Container-title:Genome Biology
language:en
Short-container-title:Genome Biol

Author:

Leonard Alexander S.,Crysnanto Danang,Mapel Xena M.,Bhati Meenu,Pausch Hubert^ORCID

Abstract

Abstract Background Several models and algorithms have been proposed to build pangenomes from multiple input assemblies, but their impact on variant representation, and consequently downstream analyses, is largely unknown. Results We create multi-species super-pangenomes using pggb, cactus, and minigraph with the Bos taurus taurus reference sequence and eleven haplotype-resolved assemblies from taurine and indicine cattle, bison, yak, and gaur. We recover 221 k nonredundant structural variations (SVs) from the pangenomes, of which 135 k (61%) are common to all three. SVs derived from assembly-based calling show high agreement with the consensus calls from the pangenomes (96%), but validate only a small proportion of variations private to each graph. Pggb and cactus, which also incorporate base-level variation, have approximately 95% exact matches with assembly-derived small variant calls, which significantly improves the edit rate when realigning assemblies compared to minigraph. We use the three pangenomes to investigate 9566 variable number tandem repeats (VNTRs), finding 63% have identical predicted repeat counts in the three graphs, while minigraph can over or underestimate the count given its approximate coordinate system. We examine a highly variable VNTR locus and show that repeat unit copy number impacts the expression of proximal genes and non-coding RNA. Conclusions Our findings indicate good consensus between the three pangenome methods but also show their individual strengths and weaknesses that need to be considered when analysing different types of variants from multiple input assemblies.

Funder

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

Eidgenössische Technische Hochschule Zürich

Swiss Federal Institute of Technology Zurich

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13059-023-02969-y.pdf

Reference73 articles.

1. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome.” Proc Natl Acad Sci U S A. 2005;102:13950–5.

2. Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A, Hickey G, et al. Pangenome Graphs. Annu Rev Genomics Hum Genet. 2020;21:139–62.

3. Li H, Feng X, Chu C. The design and construction of reference pangenome graphs. Genome Biol. 2020;21:1–19.