Improved representation of sequence Bloom trees-Reference-Cited by-同舟云学术

Improved representation of sequence Bloom trees

Published:2018-12-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Harris Robert S.,Medvedev Paul

Abstract

AbstractAlgorithmic solutions to index and search biological databases are a fundamental part of bioinformatics, providing underlying components to many end-user tools. Inexpensive next generation sequencing has filled publicly available databases such as the Sequence Read Archive beyond the capacity of traditional indexing methods. Recently, the Sequence Bloom Tree (SBT) and its derivatives were proposed as a way to efficiently index such data for queries about transcript presence. We build on the SBT framework to construct the HowDe-SBT data structure, which uses a novel partitioning of information to reduce the construction and query time as well as the size of the index. We evaluate HowDe-SBT by both proving theoretical bounds on its performance and using real RNA-seq data. Compared to previous SBT methods, HowDe-SBT can construct the index in less than 36% the time, and with 39% less space, and can answer small-batch queries at least five times faster. HowDe-SBT is available as a free open source program on

https://github.com/medvedevgroup/HowDeSBT

Publisher

Cold Spring Harbor Laboratory

Reference18 articles.

1. Almodaresi, F. , Pandey, P. , and Patro, R. (2017). Rainbowfish: A succinct colored de Bruijn graph representation. In LIPIcs-Leibniz International Proceedings in Informatics, volume 88. Schloss Dagstuhl-Leibniz-Zentrum fuer Tnformatik.

2. Almodaresi, F. , Pandey, P. , Ferdman, M. , Johnson, R. , and Patro, R. (2018). An efficient, scalable and exact representation of high-dimensional color information enabled via de Bruijn graph search. bioRxiv, page 464222.

3. Space/time trade-offs in hash coding with allowable errors

4. Bradley, P. , den Bakker, H. , Rocha, E. , McVean, G. , and Iqbal, Z. (2017). Real-time search of all bacterial and viral genomic data. bioRxiv, page 234955.

5. BLAST+: architecture and applications

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Representation of k-Mer Sets Using Spectrum-Preserving String Sets;Journal of Computational Biology;2021-04-01

2. The statistics of k-mers from a sequence undergoing a simple mutation process without spurious matches;2021-01-17

3. REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets;2020-03-30

4. Representation of k-mer sets using spectrum-preserving string sets;2020-01-08

5. Representation of $$k$$-mer Sets Using Spectrum-Preserving String Sets;Lecture Notes in Computer Science;2020