Abstract
Telomere-to-telomere phased assemblies are standard expectations. To achieve these for diploid and even polyploid genomes, the contemporary approach involves at least two long-read sequencing technologies: high-accuracy HiFi or Duplex nanopore long reads and ultra-long noisy nanopore reads. Using two different technologies increases the cost and the required amount of genomic DNA. Here, we show that comparable results are possible using error correction of nanopore Simplex ultra-long reads and then assembling them using existing state-of-the-art de novo assembly methods. We have developed the HERRO model based on deep learning, which corrects Simplex nanopore reads longer than 10kbp and with a quality value higher than 10. Taking into account informative positions that vary between haplotypes or segments in duplications, HERRO achieves an increase of accuracy of up to 100-fold. Combing HERRO with Verkko assembler, we achieve high contiguity on human genomes by reconstructing many chromosomes telomere-to-telomere, including chromosomes X and Y. We show that HERRO generalises well to other species and it supports both R9.4.1. and R10.4.1 nanopore Simplex reads. These results offer an opportunity to decrease the genome sequencing cost and apply corrected reads to more complex genomes with different levels of ploidy or even aneuploidy.
Publisher
Cold Spring Harbor Laboratory