Multiple Buffering for Parallel Approximate Sequence Matching using Disk-based Suffix Tree on Multi-core CPU-Reference-Cited by-同舟云学术

Multiple Buffering for Parallel Approximate Sequence Matching using Disk-based Suffix Tree on Multi-core CPU

Published:2013-12 Issue:3 Volume:3 Page:
ISSN:2010-2283
Container-title:GSTF Journal on Computing (JoC)
language:en
Short-container-title:GSTF J Comput

Author:

Tamura Keiichi,Watanuki Yousuke,Kitakami Hajime,Takahashi Yoshifumi

Abstract

Abstract Suffix trees, which are trie structures that present the suffixes of sequences (e.g., strings), are widely used for sequence search in different application domains such as, text data mining, bioinformatics and computational biology. In particular, suffix trees are useful in bioinformatics applications, because they can search similar sub-sequences and extract frequent sequence patterns efficiently. In recent years, efficient construction of a suffix tree that allows faster sequence searches has become one of the most important challenges, because the number and size of the data that are stored in sequence databases have been increasing exponentially. This paper proposes a novel parallelization model for approximate sequence matching that uses disk-based suffix trees, which are built on hard disks not on memory, on a multi-core CPU. In the proposed parallelization model, we divide an entire sequence database into two or more sub-databases called partitions. For each partition, we build a disk-based suffix tree and define a task as an approximate sequence matching on one disk-based suffix tree. Moreover, the proposed parallelization model involves a multiple buffering management system to avoid conflicts among CPU-cores. We evaluated the proposed parallelization model using an actual amino acid sequence database on a PC. The experimental results show a substantial improvement in computation performance.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.7603/s40601-013-0022-0.pdf

Reference20 articles.

1. P. Weiner, “Linear pattern matching algorithms,” in Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973), SWAT ’73, pp. 1–11, 1973.

2. E. M. McCreight, “A space-economical suffix tree construction algorithm,” Journal of the ACM, vol. 23, pp. 262–272, Apr. 1976.

3. D. Gusfield, Algorithms on strings, trees, and sequences: computer science and computational biology. New York, NY, USA: Cambridge University Press, 1997.

4. Y. Tian, S. Tata, R. A. Hankins, and J. M. Patel, “Practical methods for constructing suffix trees,” The VLDB Journal, vol. 14, no. 3, pp. 281–299, 200–5.

5. B. Phoophakdee and M. J. Zaki, “Genome-scale disk-based suffix tree indexing,” in Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD ’07, pp. 833–844, 2007.