Personalized Pangenome References-Reference-Cited by-同舟云学术

Personalized Pangenome References

Published:2023-12-14 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Sirén Jouni^ORCID,Eskandar Parsa^ORCID,Ungaro Matteo Tommaso^ORCID,Hickey Glenn^ORCID,Eizenga Jordan M.^ORCID,Novak Adam M.^ORCID,Chang Xian^ORCID,Chang Pi-Chuan^ORCID,Kolmogorov Mikhail^ORCID,Carroll Andrew^ORCID,Monlong Jean^ORCID,Paten Benedict^ORCID

Abstract

AbstractPangenomes, by including genetic diversity, should reduce reference bias by better representing new samples compared to them. Yet when comparing a new sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with using allele frequency filters. However, this is a blunt heuristic that both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach, inspired by local ancestry inference methods, that imputes a personalized pangenome subgraph based on sampling local haplotypes according tok-mer counts in the reads. Our approach is tailored for the Giraffe short read aligner, as the indexes it needs for read mapping can be built quickly. We compare the accuracy of our approach to state-of-the-art methods using graphs from the Human Pangenome Reference Consortium. The resulting personalized pangenome pipelines provide faster pangenome read mapping than comparable pipelines that use a linear reference, reduce small variant genotyping errors by 4x relative to the Genome Analysis Toolkit (GATK) best-practice pipeline, and for the first time make short-read structural variant genotyping competitive with long-read discovery methods.

Publisher

Cold Spring Harbor Laboratory

Reference38 articles.

1. An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development

2. Accurate human genome analysis with Element Avidity sequencing

3. Distance indexing and seed clustering in sequence graphs

4. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications

5. The K-mer File Format: a standardized and compact disk representation of sets of k-mers

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Draft Pacific Ancestry Pangenome Reference;2024-08-09

2. Cluster efficient pangenome graph construction with nf-core/pangenome;2024-05-15