Affiliation:
1. Institute of Applied Computer Science, Lodz University of Technology, Lodz 90-924, Poland
Abstract
Abstract
Motivation
The amount of sequencing data from high-throughput sequencing technologies grows at a pace exceeding the one predicted by Moore’s law. One of the basic requirements is to efficiently store and transmit such huge collections of data. Despite significant interest in designing FASTQ compressors, they are still imperfect in terms of compression ratio or decompression resources.
Results
We present Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm for compressing the DNA stream, based on the idea of building an approximation of the shortest common superstring over high-quality reads. Experiments show that PgRC wins in compression ratio over its main competitors, SPRING and Minicom, by up to 15 and 20% on average, respectively, while being comparably fast in decompression.
Availability and implementation
PgRC can be downloaded from https://github.com/kowallus/PgRC.
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
Smart Growth Operational Program
Polish National Centre for Research and Development
Institute of Applied Computer Science
Lodz University of Technology
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献