Author:
Nicolae Marius,Rajasekaran Sanguthevar
Abstract
Abstract
Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (ℓ, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers ℓ and d. It returns all sequences M of length ℓ that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (ℓ, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.
Publisher
Springer Science and Business Media LLC
Reference16 articles.
1. Pevzner, P. A. & Sze, S.-H. Combinatorial approaches to finding subtle signals in dna sequences. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, La Jolla / San Diego, CA, USA, vol. 8, 269–278 (AAAI Press 2000).
2. Buhler, J. & Tompa, M. Finding motifs using random projections. J. Comp. Biol. 9, 225–242 (2002).
3. Eskin, E. & Pevzner, P. A. Finding composite regulatory patterns in dna sequences. Bioinformatics 18, 354–363 (2002).
4. Price, A., Ramabhadran, S. & Pevzner, P. A. Finding subtle motifs by branching from sample strings. Bioinformatics 19, 149–155 (2003).
5. Kevin Lanctot, J., Li, M., Ma, B., Wang, S. & Zhang, L. Distinguishing string selection problems. Inform. Comput. 185, 41–55 (2003).
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献