The Hitchhiker’s Guide to Sequencing Data Types and Volumes for Population-Scale Pangenome Construction

Author:

Sarashetti Prasad,Lipovac Josipa,Tomas Filip,Šikic MileORCID,Liu Jianjun

Abstract

AbstractLong-read (LR) technologies from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have transformed genomics research by providing diverse data types like HiFi, Duplex, and ultra-long ONT (ULONT). Despite recent strides in achieving haplotype-phased gapless genome assemblies using long-read technologies, concerns persist regarding the representation of genetic diversity, prompting the development of pangenome references. However, pangenome studies face challenges related to data types, volumes, and cost considerations for each assembled genome, while striving to maintain sensitivity. The absence of comprehensive guidance on optimal data selection exacerbates these challenges. To fill this gap, our study evaluates available data types, their significance, and the required volumes for robust de novo assembly in population-level pangenome projects. The results show that achieving chromosome-level haplotype-resolved assembly requires 20x high-quality long reads (HQLR) such as PacBio HiFi or ONT duplex, combined with 15-20x of ULONT per haplotype and 30x of long-range data such as Omni-C. High-quality long reads from both platforms yield assemblies with comparable contiguity, with HiFi excelling in NG50 and phasing accuracies, while usage of duplex generates more T2T contigs. As Long-Read Technologies advance, our study reevaluates recommended data types and volumes, providing practical guidelines for selecting sequencing platforms and coverage. These insights aim to be vital to the pangenome research community, contributing to their efforts and pushing genomic studies with broader impacts.

Publisher

Cold Spring Harbor Laboratory

Reference41 articles.

1. Is it time to change the reference genome?

2. Cheng H , Asri M , Lucas J , Koren S , Li H. 2023. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. arXiv [q-bioGN]. http://arxiv.org/abs/2306.03399 (Accessed February 28, 2024).

3. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

4. Haplotype-resolved assembly of diploid genomes without parental data

5. Ensembl 2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3