Affiliation:
1. Peking University, China
2. City University of Hong Kong, China
Abstract
Motif identification for DNA sequences has many important applications in biological studies, including diagnostic probe design, locating binding sites and regulatory signals, and potential drug target identification. There are two versions—the Single Group and Two Groups. Here, the occurrences of the motif in the given sequences have errors. Currently, most of existing programs can only handle the case of single group. However, most of the programs do not allow indels (insertions and deletions) in the occurrences of the motif. In this paper, the authors propose a randomized algorithm for the one group problem that can handle indels in the occurrences of the motif. Finally, an algorithm for the two groups’ problem is given along with extensive simulations evaluating algorithms.