Abstract
AbstractLong-read (LR) technologies from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have transformed genomics research by providing diverse data types like HiFi, Duplex, and ultra-long ONT (ULONT). Despite recent strides in achieving haplotype-phased gapless genome assemblies using long-read technologies, concerns persist regarding the representation of genetic diversity, prompting the development of pangenome references. However, pangenome studies face challenges related to data types, volumes, and cost considerations for each assembled genome, while striving to maintain sensitivity. The absence of comprehensive guidance on optimal data selection exacerbates these challenges. To fill this gap, our study evaluates available data types, their significance, and the required volumes for robust de novo assembly in population-level pangenome projects. The results show that achieving chromosome-level haplotype-resolved assembly requires 20x high-quality long reads (HQLR) such as PacBio HiFi or ONT duplex, combined with 15-20x of ULONT per haplotype and 30x of long-range data such as Omni-C. High-quality long reads from both platforms yield assemblies with comparable contiguity, with HiFi excelling in NG50 and phasing accuracies, while usage of duplex generates more T2T contigs. As Long-Read Technologies advance, our study reevaluates recommended data types and volumes, providing practical guidelines for selecting sequencing platforms and coverage. These insights aim to be vital to the pangenome research community, contributing to their efforts and pushing genomic studies with broader impacts.
Publisher
Cold Spring Harbor Laboratory
Reference41 articles.
1. Is it time to change the reference genome?
2. Cheng H , Asri M , Lucas J , Koren S , Li H. 2023. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. arXiv [q-bioGN]. http://arxiv.org/abs/2306.03399 (Accessed February 28, 2024).
3. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm
4. Haplotype-resolved assembly of diploid genomes without parental data
5. Ensembl 2022