Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences-Reference-Cited by-同舟云学术

Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences

Published:2022-12-15 Issue:12 Volume:12 Page:3181
ISSN:2075-4418
Container-title:Diagnostics
language:en
Short-container-title:Diagnostics

Author:

Erten Mehmet^ORCID,Acharya Madhav R.,Kamath Aditya P.,Sampathila Niranjana^ORCID,Bairy G. Muralidhar,Aydemir Emrah^ORCID,Barua Prabal Datta^ORCID,Baygin Mehmet^ORCID,Tuncer Ilknur,Dogan Sengul^ORCID,Tuncer Turker^ORCID

Abstract

SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare’s Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (75:25 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections.

Publisher

MDPI AG

Subject

Clinical Biochemistry

Link

https://www.mdpi.com/2075-4418/12/12/3181/pdf

Reference35 articles.

1. A pneumonia outbreak associated with a new coronavirus of probable bat origin;Zhou;Nature,2020

2. Epidemiology, genetic recombination, and pathogenesis of coronaviruses;Su;Trends Microbiol.,2016

3. Khorramdelazad, H., Kazemi, M.H., Najafi, A., Keykhaee, M., Emameh, R.Z., and Falak, R. (2021). Immunopathological similarities between COVID-19 and influenza: Investigating the consequences of Co-infection. Microb. Pathog., 152.

4. Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding;Lu;Lancet,2020

5. Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China;Wu;Cell Host Microbe,2020

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Novel tiny textural motif pattern-based RNA virus protein sequence classification model;Expert Systems with Applications;2024-05

2. MehNet: a vigesimal-based model by amino acid melting points generates unique ID numbers for protein sequences;Journal of Biomolecular Structure and Dynamics;2024-01-17

3. Combating the COVID-19 infodemic using Prompt-Based curriculum learning;Expert Systems with Applications;2023-11

4. Automated characterization and detection of fibromyalgia using slow wave sleep EEG signals with glucose pattern and D’hondt pooling technique;Cognitive Neurodynamics;2023-09-12

5. Editorial on Special Issue “Medical Data Processing and Analysis”;Diagnostics;2023-06-16