Efficient relaxed search in hierarchically clustered sequence datasets-Reference-Cited by-同舟云学术

Efficient relaxed search in hierarchically clustered sequence datasets

Published:2012-07 Issue: Volume:17 Page:
ISSN:1084-6654
Container-title:ACM Journal of Experimental Algorithmics
language:en
Short-container-title:ACM J. Exp. Algorithmics

Author:

Bader Kai C.¹,Atallah Mikhail J.²,Grothoff Christian¹

Affiliation:

1. Technische Universität München, Germany

2. Purdue University, IN

Abstract

This article presents a new algorithm for finding oligonucleotide signatures that are specific and sensitive for organisms or groups of organisms in large-scale sequence datasets. We assume that the organisms have been organized in a hierarchy, for example, a phylogenetic tree. The resulting signatures, binding sites for primers and probes, match the maximum possible number of organisms in the target group while having at most k matches outside of the target group. The key step in the algorithm is the use of the lowest common ancestor (LCA) to search the organism hierarchy; this allows the combinatorial problem in almost linear time (empirically observed) to be solved. The presented algorithm improves performance by several orders of magnitude in terms of both memory consumption and runtime when compared to the best-known previous algorithms while giving identical, exact solutions. This article gives a formal description of the algorithm, discusses details of our concrete, publicly available implementation, and presents the results from our performance evaluation.

Funder

Qatar Foundation

Division of Computer and Network Systems

Division of Computing and Communication Foundations

Publisher

Association for Computing Machinery (ACM)

Subject

Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2133803.2212315

Reference31 articles.

1. Replacing suffix trees with enhanced suffix arrays

2. Single-cell identification in microbial communities by improved fluorescence in situ hybridization techniques

3. Phylogenetic identification and in situ detection of individual microbial cells without cultivation;Amann R. I.;Microbiol Rev.,1995

4. Bader K. C. Eissler T. Evans N. GauthierDickey C. Grothoff C. Grothoff K. Keene J. Meier H. Ritzdorf C. and Rutherford M. J . 2010 . Distributed stream processing with DUP. In Network and Parallel Computing Lecture Notes in Computer Science vol. 6289 Springer Berlin 232--246. Bader K. C. Eissler T. Evans N. GauthierDickey C. Grothoff C. Grothoff K. Keene J. Meier H. Ritzdorf C. and Rutherford M. J. 2010. Distributed stream processing with DUP. In Network and Parallel Computing Lecture Notes in Computer Science vol. 6289 Springer Berlin 232--246.

5. Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets