Measuring, visualizing, and diagnosing reference bias with biastools-Reference-Cited by-同舟云学术

Measuring, visualizing, and diagnosing reference bias with biastools

Published:2024-04-19 Issue:1 Volume:25 Page:
ISSN:1474-760X
Container-title:Genome Biology
language:en
Short-container-title:Genome Biol

Author:

Lin Mao-Jan,Iyer Sheila,Chen Nae-Chyun,Langmead Ben^ORCID

Abstract

AbstractMany bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor’s variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using , we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use to characterize how T2T references improve large-scale bias.

Funder

National Human Genome Research Institute

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13059-024-03240-8.pdf

Reference38 articles.

1. Anson EL, Myers EW. ReAligner: a program for refining DNA sequence multi-alignments. J Comput Biol. 1997;4(3):369–83.

2. Assmus J, Kleffe J, Schmitt AO, Brockmann GA. Equivalent indels-ambiguous functional classes and redundancy in databases. PLoS ONE. 2013;8(5):e62803.

3. Baid G, Nattestad M, Kolesnikov A, Goel S, Yang H, Chang PC, et al. Google Brain Genomics Sequencing Dataset for Benchmarking and Development. Dataset. 2020. https://console.cloud.google.com/storage/browser/brain-genomics-public/research/sequencing/fastq/novaseq/wgs_pcr_free/30x. Accessed 15 Apr 2024.

4. Brandt DY, Aguiar VR, Bitarello BD, Nunes K, Goudet J, Meyer D. Mapping Bias Overestimates Reference Allele Frequencies at the HLA Genes in the 1000 Genomes Project Phase I Data. G3 (Bethesda). 2015;5(5):931–41.

5. Chen NC, Paulin LF, Sedlazeck FJ, Koren S, Phillippy AM, Langmead B. Improved sequence mapping using a complete reference genome and lift-over. Nat Methods. 2024;21(1):41–9.