Author:
Saha Paramita,Kumar Sarkar Bimal
Abstract
Abstract
In this work, we have described the analysis of digitized sequences of genetic information by means of the notions of entropy. The occurrence of a particular pattern in the genetic sequence is paid a special attention. The occurrence of genetic word is expressed in a density manner. The occurrence frequency of the q-gram genetic word of interest is determined with the help of finite impulse response (FIR) type filter along the sequence. It is in turn, used for the determination of horizontal correlations, i.e., correlations between the word along the sequence. We use the probability distribution of the genetic word occurrence as the input for the calculation of entropy in the sequence. The sequence entropy is further used for principal component analysis (PCA) to determine the similarity / dissimilarity between the biological sequences. The technique is verified by using 48 HEV genotypes.
Subject
General Physics and Astronomy
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献