Automatic Identification of SARS Coronavirus using Compression-Complexity Measures-Reference-Cited by-同舟云学术

Automatic Identification of SARS Coronavirus using Compression-Complexity Measures

Published:2020-03-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Balasubramanian Karthi,Nagaraj Nithin

Abstract

AbstractFinding vaccine or specific antiviral treatment for global pandemic of virus diseases (such as the ongoing COVID-19) requires rapid analysis, annotation and evaluation of metagenomic libraries to enable a quick and efficient screening of nucleotide sequences. Traditional sequence alignment methods are not suitable and there is a need for fast alignment-free techniques for sequence analysis. Information theory and data compression algorithms provide a rich set of mathematical and computational tools to capture essential patterns in biological sequences. In 2013, our research group (Nagaraj et al., Eur. Phys. J. Special Topics 222(3-4), 2013) has proposed a novel measure known as Effort-To-Compress (ETC) based on the notion of compression-complexity to capture the information content of sequences. In this study, we propose a compression-complexity based distance measure for automatic identification of SARS coronavirus strains from a set of viruses using only short fragments of nucleotide sequences. We also demonstrate that our proposed method can correctly distinguish SARS-CoV-2 from SARS-CoV-1 viruses by analyzing very short segments of nucleotide sequences. This work could be extended further to enable medical practitioners in automatically identifying and characterizing SARS coronavirus strain in a fast and efficient fashion using short and/or incomplete segments of nucleotide sequences. Potentially, the need for sequence assembly can be circumvented.NoteThe main ideas and results of this research were first presented at the International Conference on Nonlinear Systems and Dynamics (CNSD-2013) held at Indian Institute of Technology, Indore, December 12, 2013. In this manuscript, we have extended our preliminary analysis to include SARS-CoV-2 virus as well.

Publisher

Cold Spring Harbor Laboratory

Reference43 articles.

1. The Genome Sequence of the SARS-Associated Coronavirus

2. Characterization of a Novel Coronavirus Associated with Severe Acute Respiratory Syndrome

3. Toward an alignment-free method for feature extraction and accurate classification of viral sequences;Journal of Computational Biology,2019

4. An alignment-free measure based on physicochemical properties of amino acids for protein sequence comparison;Computational biology and chemistry,2019

5. Alignment-free sequence analysis and applications;Annual Review of Biomedical Data Science,2018