Species-specific basecallers improve actual accuracy of nanopore sequencing in plants-Reference-Cited by-同舟云学术

Species-specific basecallers improve actual accuracy of nanopore sequencing in plants

Published:2022-12-14 Issue:1 Volume:18 Page:
ISSN:1746-4811
Container-title:Plant Methods
language:en
Short-container-title:Plant Methods

Author:

Ferguson Scott,McLay Todd,Andrew Rose L.,Bruhl Jeremy J.,Schwessinger Benjamin,Borevitz Justin,Jones Ashley

Abstract

Abstract Background Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies. A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences. This is challenging as current basecallers are primarily based on mixtures of model species for training. Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy. We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models. We also evaluate accuracy gains from ONT’s improved flowcells (R10.4, FLO-PRO112) and sequencing kits (SQK-LSK112). For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads. Results Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.96% and 94.15%. Guppy’s plant-specific model gave highly mixed results, attaining read accuracies of 91.47% and 96.18%. Species-specific basecalling models improved read accuracy, attaining 93.24% and 95.16% read accuracies. R10.4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.46% (super-accurate) and 96.87% (species-specific). Conclusions The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species. Training of single-species and genome-specific basecaller models improves read accuracy. Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models. Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes.

Publisher

Springer Science and Business Media LLC

Subject

Plant Science,Genetics,Biotechnology

Link

https://link.springer.com/content/pdf/10.1186/s13007-022-00971-2.pdf

Reference37 articles.

1. Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol. 2021;39(11):1348–65.

2. Fuller CW, Kumar S, Porel M, Chien M, Bibillo A, Stranges PB, et al. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array. Proc Natl Acad Sci. 2016;113(19):5233–8.

3. Silvestre-Ryan J, Holmes I. Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing. Genome Biol. 2021;22(1):38.

4. Simpson JT, Workman RE, Zuzarte PC, David M, Dursi LJ, Timp W. Detecting DNA cytosine methylation using nanopore sequencing. Nat Methods. 2017;14(4):407–10.

5. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 2020;21(1):30.

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Combining short and long read sequencing technologies to identify SARS-CoV-2 variants in wastewater;2024-08-08

2. A diploid chromosome-level genome ofEucalyptus regnans: unveiling haplotype variance in structure and genes within one of the world’s tallest trees;2024-06-29

3. Sequencing accuracy and systematic errors of nanopore direct RNA sequencing;BMC Genomics;2024-05-28

4. Advances of high-throughput sequencing for unraveling biotechnological potential of microalgal-bacterial communities;Journal of Applied Phycology;2024-05-18

5. Streamlining remote nanopore data access with slow5curl;GigaScience;2024