Abstract
AbstractHigh-throughput sequencing techniques and sequence analysis have enabled the taxonomic classification of pathogens present in clinical samples. Sequencing provides an unbiased identification and systematic classification of pathogens and this is generally achieved by comparing novel sequences to pre-existing annotated reference databases. However, this approach is limited by large-scale reference databases which require considerable computational resources and skills to compare against. Alternative robust methods such as machine learning are currently employed in genome sequence analysis and classification, and it can be applied in classifying SARS-CoV-2 variants, whose continued evolution has resulted in the emergence of multiple variants.We developed a deep learning Convolutional Neural Networks-Long Short Term Memory (CNN-LSTM) model to classify dominant SARS-CoV-2 variants (omicron, delta, beta, gamma and alpha) based on gene sequences from the surface glycoprotein (spike gene). We trained and validated the model using > 26,000 SARS-CoV-2 sequences from the GISAID database. The model was evaluated using unseen 3,057 SARS-CoV-2 sequences. The model was compared to existing molecular epidemiology tool, nextclade.Our model achieved an accuracy of 98.55% on training, 99.19% on the validation and 98.41% on the test dataset. Comparing the proposed model to nextclade, the model achieved significant accuracy in classifying SARS-CoV-2 variants from unseen data. Nextclade identified the presence of recombinant strains in the evaluation data, a mechanism that the proposed model did not detect.This study provides an alternative approach to pre-existing methods employed in the classification of SARS-CoV-2 variants. Timely classification will enable effective monitoring and tracking of SARS-CoV-2 variants and inform public health policies in the control and management of the COVID-19 pandemic.
Publisher
Cold Spring Harbor Laboratory
Reference52 articles.
1. Deubelbeiss, A. , Zahno, M.L. , Zanoni, M. , Bruegger, D. , and Zanoni, R. (2014). Real-Time RT-PCR for the Detection of Lyssavirus Species. J. Vet. Med. 2014, 476091.
2. Infectious bronchitis virus: detection and vaccine Strain differentiation by semi-nested RT-PCR;Rev. Bras. Cienc. Avic,2005
3. High-Throughput Metagenomics for Identification of Pathogens in the Clinical Settings;Small Methods,2021
4. An Assessment of Traditional and Genomic Screening in Newborns and their Applicability for Africa;Informatics in Medicine Unlocked,2022
5. Comparative study between molecular and genetic evolutionary analysis tools using African SARS-CoV-2 variants;Informatics in Medicine Unlocked,2023
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献