Affiliation:
1. National Institute of Technology Tiruchirappalli
Abstract
Abstract
A comprehensive understanding of transcription factor binding sites (TFBSs) is a key problem in contemporary biology, which is a critical issue in gene regulation. In the process of identifying a pattern of TFBSs in every DNA sequence, motif discovery reveals the basic regulatory relationship and compassionate the evolutionary system of every species. In this case, however, it is a challenge to recognize the high-quality motif ( ℓ , d) . We intend to address the above problem to the motif discovery and the motif finding using approximate qPMS algorithms such as S2F (Segmentation to Filtration) and FFF (Firefly with FREEZE). To this end, whole DNA sequences are segmented in two sections where the first part is sliced by base and sub k-mers , and the motif is calculated based on the accuracy. The motif that is recognized in the first portion is given as an input to the FFF algorithm to identify the TFBSs locations in the second portion. The algorithm performance is tested on both simulated and real datasets. In particular, real datasets like Escherichia coli cyclic AMP receptor protein(CRP), mouse Embryonic Stem Cell (mESC), and human species ChIP-seq dataset are explored. Results from the experiments show that S2F and FFF algorithms can identify the motifs and appear faster compared to previous state-of-the-art PMS and QPMS algorithms.
Publisher
Research Square Platform LLC
Reference31 articles.
1. DNA binding sites: Representation and discovery
2. Combinatorial approaches to finding subtle signals in DNA sequences;Pevzner S-HS;ISMB,2000
3. A New Efficient Algorithm for Quorum Planted Motif Search on Large DNA Datasets;Yu Q;IEEE Access,2019
4. QPMS9: An efficient algorithm for quorum planted motif search;Nicolae M;Sci Rep,2015
5. RefSelect: A reference sequence selection algorithm for planted (l, d) motif search;Yu Q;BMC Bioinformatics,2016