Abstract
AbstractMotivationBasecalling long DNA sequences is a crucial step in nanopore-based DNA sequencing protocols. Recently the CTC-RNN model has become the leading basecalling model and has replaced prior hidden Markov models (HMMs) based on pre-segmentation of the ion current measurements. However, CTC-RNN relies entirely on the neural network to make basecalling decisions while prior knowledge of the biological and physical processes is not involved. We believe that there are undiscovered potentials in using this prior knowledge that can benefit the basecalling performances and that HMMs can indeed be used to leverage this information if combined with an RNN in an end-to-end algorithm.ResultsWe present the basecaller named Lokatt: explicit duration Markov model and residual-LSTM network. It is based on an Explicit Duration HMM (EDHMM), designed to model the nanopore sequencing processes. Trained on a new library created with free-methylation Ecoli genome with MinION R9.4.1 chemistry, the Lokatt basecaller achieves basecalling performances with a median single read identity score of 92%, on par with existing state-of-the-art.Availabilityhttps://github.com/chunxxc/lokattContactchunx@kth.se
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献