REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets-Reference-Cited by-同舟云学术

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets

Published:2020-03-30 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Marchet Camille^ORCID,Iqbal Zamin^ORCID,Gautheret Daniel^ORCID,Salson Mikael,Chikhi Rayan^ORCID

Abstract

AbstractMotivationIn this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets.ResultsWe used REINDEER to index the abundances of sequences within 2,585 human RNA-seq experiments in 45 hours using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of 4 billion distinct k-mers across 2,585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph (DBG) of each dataset, then conceptually merges those DBGs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances.Availability

https://github.com/kamimrcht/REINDEER

Contactcamille.marchet@univ-lille.fr

Publisher

Cold Spring Harbor Laboratory

Reference32 articles.

1. The European Nucleotide Archive in 2019;Nucleic acids research,2020

2. The Sequence Read Archive;International Nucleotide Sequence Database Collaboration;Nucleic acids research,2010

3. Heng Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997, 2013.

4. BEETL-fastq: a searchable compressed archive for DNA reads

5. BLAST+: architecture and applications

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Disk compression of k-mer sets;Algorithms for Molecular Biology;2021-06-21

2. Data structures based on k-mers for querying large collections of sequencing data sets;Genome Research;2020-12-16

3. Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences;2020-10-08

4. Data structures based on k-mers for querying large collections of sequencing datasets;2019-12-06