A Modified Median String Algorithm for Gene Regulatory Motif Classification-Reference-Cited by-同舟云学术

A Modified Median String Algorithm for Gene Regulatory Motif Classification

Published:2020-08-14 Issue:8 Volume:12 Page:1363
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Kaysar Mohammad Shibli,Khan Mohammad Ibrahim

Abstract

Consensus string is a significant feature of a deoxyribonucleic acid (DNA) sequence. The median string is one of the most popular exact algorithms to find DNA consensus. A DNA sequence is represented using the alphabet Σ= {a, c, g, t}. The algorithm generates a set of all the 4l possible motifs or l-mers from the alphabet to search a motif of length l. Out of all possible l-mers, it finds the consensus. This algorithm guarantees to return the consensus but this is NP-complete and runtime increases with the increase in l-mer size. Using transitional probability from the Markov chain, the proposed algorithm symmetrically generates four subsets of l-mers. Each of the subsets contains a few l-mers starting with a particular letter. We used these reduced sets of l-mers instead of using 4ll-mers. The experimental result shows that the proposed algorithm produces a much lower number of l-mers and takes less time to execute. In the case of l-mer of length 7, the proposed system is 48 times faster than the median string algorithm. For l-mer of size 7, the proposed algorithm produces only 2.5% l-mer in comparison with the median string algorithm. While compared with the recently proposed voting algorithm, our proposed algorithm is found to be 4.4 times faster for a longer l-mer size like 9.

Publisher

MDPI AG

Subject

Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2073-8994/12/8/1363/pdf

Reference29 articles.

1. Methods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery

2. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

3. Consensus sequence Zen;Schneider;Appl. Bioinform.,2002

4. Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification;Soft Computing;2021-09-02

2. Boosting Perturbation-Based Iterative Algorithms to Compute the Median String;IEEE Access;2021

3. Erratum: Mohammad Shibli Kaysar and Mohammad Ibrahim Khan A Modified Median String Algorithm for Gene Regulatory Motif Classification Symmetry 2020, 12, 8;Symmetry;2020-08-30