Abstract
AbstractMany bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor’s variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using , we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use to characterize how T2T references improve large-scale bias.
Funder
National Human Genome Research Institute
Publisher
Springer Science and Business Media LLC
Reference38 articles.
1. Anson EL, Myers EW. ReAligner: a program for refining DNA sequence multi-alignments. J Comput Biol. 1997;4(3):369–83.
2. Assmus J, Kleffe J, Schmitt AO, Brockmann GA. Equivalent indels-ambiguous functional classes and redundancy in databases. PLoS ONE. 2013;8(5):e62803.
3. Baid G, Nattestad M, Kolesnikov A, Goel S, Yang H, Chang PC, et al. Google Brain Genomics Sequencing Dataset for Benchmarking and Development. Dataset. 2020. https://console.cloud.google.com/storage/browser/brain-genomics-public/research/sequencing/fastq/novaseq/wgs_pcr_free/30x. Accessed 15 Apr 2024.
4. Brandt DY, Aguiar VR, Bitarello BD, Nunes K, Goudet J, Meyer D. Mapping Bias Overestimates Reference Allele Frequencies at the HLA Genes in the 1000 Genomes Project Phase I Data. G3 (Bethesda). 2015;5(5):931–41.
5. Chen NC, Paulin LF, Sedlazeck FJ, Koren S, Phillippy AM, Langmead B. Improved sequence mapping using a complete reference genome and lift-over. Nat Methods. 2024;21(1):41–9.