Abstract
AbstractBackground:Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a sample. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an example. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of ~77%-97% (average accuracy <89%). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization.Results:For the five datasets tested, DeepSelectNet’s accuracy varied between ~91%-99% (average accuracy ~95%). At its best performance, DeepSelectNet achieved a nearly 12% accuracy increase compared to its deep learning-based predecessor SquiggleNet. Furthermore, precision and recall evaluated for DeepSelectNet on average were always >89% (average ~95%). In terms of execution performance, DeepSelectNet outperformed SquiggleNet by ~13% on average. Thus, DeepSelectNet is a practically viable method to improve the effectiveness of selective sequencing.Conclusions:Compared to base alignment and deep learning predecessors, DeepSelectNet can significantly improve the accuracy to enable real-time species classification using selective sequencing. The source code of DeepSelectNet is available athttps://github.com/AnjanaSenanayake/DeepSelectNet.
Publisher
Cold Spring Harbor Laboratory