Affiliation:
1. College of Life Sciences University of Chinese Academy of Sciences Beijing China
2. BGI Research Shenzhen China
3. MGI Tech Shenzhen China
4. China National GeneBank, BGI Research Shenzhen China
5. BGI Research Qingdao China
Abstract
AbstractBackgroundWith the advancement of whole‐genome sequencing (WGS) technology, massively parallel sequencing (MPS) remains the mainstream due to its accuracy, low cost, and high throughput. The development of the analytical pipeline corresponding to MPS has always been of great importance. Increasingly large population genomics studies, as a specific type of big data research, pose new challenges for analysis solutions.ResultsHere, we introduce ZBOLT, a comprehensive analysis system that incorporates both software and hardware advancements, making it an appropriate choice for large‐scale population genomic studies that require extensive data processing. In this study, we first evaluate ZBOLT's calling accuracy using the Genome in a Bottle (GIAB) benchmark dataset. Then we apply ZBOLT to a large‐scale population genomics study with 5,616 high sequencing depth samples totaling 1.16Pbp (base pair). As the results show, ZBOLT demonstrates exceptional efficiency and low energy consumption, processing 100Tbp per day and using 1kWh per 100Gbp sequenced sample.ConclusionThis research serves as a valuable reference for analyzing sequencing data from large population cohorts and underscores the significant potential of ZBOLT in large‐scale population genomics studies.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Natural Science Foundation of Guangdong Province