Abstract
One of the first steps in protein sequence analysis is comparing sequences to look for similarities. We propose an information theoretical distance to compare cellular automata representing protein sequences, and determine similarities. Our approach relies in a stationary Hamming distance for the evolution of the automata according to a properly chosen rule, and to build a pairwise similarity matrix and determine common ancestors among different species in a simpler and less computationally demanding computer codes when compared to other methods.
Funder
Conselho Nacional de Desenvolvimento Científico e Tecnológico
Publisher
Public Library of Science (PLoS)
Reference43 articles.
1. The amino-acid sequence in the glycyl chain of insulin. 1. The identification of lower peptides from partial hydrolysates;F Sanger;Biochemical Journal,1953
2. The amino-acid sequence in the glycyl chain of insulin. II. The investigation of peptides from enzymic hydrolysates;F Sanger;The Biochemical journal,1953
3. A brief history of bioinformatics;J Gauthier;Briefings in Bioinformatics,2018
4. UniProt. The Universal Protein Resource; 2021. Available from: https://www.uniprot.org.
5. GenBank. National Center for Biotechnology Information; 2021. Available from: https://www.ncbi.nlm.nih.gov/genbank.