Abstract
AbstractAssembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping information for Revealing Assembly Quality (CRAQ), which maps raw reads back to assembled sequences to identify regional and structural assembly errors based on effective clipped alignment information. Error counts are transformed into corresponding assembly evaluation indexes to reflect the assembly quality at single-nucleotide resolution. Notably, CRAQ distinguishes assembly errors from heterozygous sites or structural differences between haplotypes. This tool can clearly indicate low-quality regions and potential structural error breakpoints; thus, it can identify misjoined regions that should be split for further scaffold building and improvement of the assembly. We have benchmarked CRAQ on multiple genomes assembled using different strategies, and demonstrated the misjoin correction for improving the constructed pseudomolecules.
Publisher
Springer Science and Business Media LLC
Subject
General Physics and Astronomy,General Biochemistry, Genetics and Molecular Biology,General Chemistry,Multidisciplinary
Reference65 articles.
1. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133 (2009).
2. Eisenstein, M. Oxford Nanopore announcement sets sequencing sector abuzz. Nat. Biotechnol. 30, 295–296 (2012).
3. Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).
4. Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–45 (2018).
5. Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 32, 608–15 (2020).
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献