RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes-Reference-Cited by-同舟云学术

RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes

Published:2020-12-21 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Agret Clément^ORCID,Chateau Annie^ORCID,Droc Gaetan^ORCID,Sarah Gautier^ORCID,Ruiz Manuel^ORCID,Mancheron Alban^ORCID

Abstract

AbstractBackgroundAs the cost of DNA sequencing decreases, high-throughput sequencing technologies become increasingly accessible to many laboratories. Consequently, new issues emerge that require new algorithms, including tools for indexing and compressing hundred to thousands of complete genomes.ResultsThis paper presents RedOak, a reference-free and alignment-free software package that allows for the indexing of a large collection of similar genomes. RedOak can also be applied to reads from unassembled genomes, and it provides a nucleotide sequence query function. This software is based on a k-mer approach and has been developed to be heavily parallelized and distributed on several nodes of a cluster. The source code of our RedOak algorithm is available at https://gite.lirmm.fr/doccy/RedOak.ConclusionsRedOak may be really useful for biologists and bioinformaticians expecting to extract information from large sequence datasets.

Publisher

Cold Spring Harbor Laboratory

Reference29 articles.

1. Computational pan-genomics: status, promises and challenges

2. Towards plant pangenomics

3. An experimental study of a compressed index

4. The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words

5. Suffix Arrays: A New Method for On-Line String Searches

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Conway-Bromage-Lyndon (CBL): an exact, dynamic representation ofk-mer sets;2024-01-31