Protein Sequence Comparison Method Based on 3-ary Huffman Coding
-
Published:2023-04
Issue:2
Volume:90
Page:357-380
-
ISSN:0340-6253
-
Container-title:Match Communications in Mathematical and in Computer Chemistry
-
language:
-
Short-container-title:match
Author:
Qi Zhaohui, ,Ning Yingqiang,Huang Yinmei, , , ,
Abstract
Based on 3-ary Huffman coding algorithm, we propose a digital mapping method of protein sequence. Firstly, a 3-ary Huffman tree is defined by the frequency characteristic of 20 amino acids in given protein sequences. The 0-2 codes of 20 amino acids constructed by the 3-ary Huffman tree can convert long protein sequences into one-to-one 0-2 digital sequences. According to the frequency characteristic and the distribution information of 0-2 codes of 20 amino acids in the 0-2 digital sequences, we design the 40-dimensional vectors to characterize the protein sequences. Next, the proposed digital mapping method is used to perform three separate applications, similarity comparison of nine ND6 proteins, evolutionary trend analysis of the 2009 pandemic Human influenza A (H1N1) viruses from January 2020 to June 2022, and the evolution analysis of 95 coronavirus genes. The results illustrate the utility of the proposed method.
Publisher
University Library in Kragujevac
Subject
Applied Mathematics,Computational Theory and Mathematics,Computer Science Applications,General Chemistry