SFQ: Constructing and Querying a Succinct Representation of FASTQ Files-Reference-Cited by-同舟云学术

SFQ: Constructing and Querying a Succinct Representation of FASTQ Files

Published:2022-06-04 Issue:11 Volume:11 Page:1783
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Bakarić Robert,Korenčić Damir^ORCID,Hršak Dalibor^ORCID,Ristov Strahil^ORCID

Abstract

A large and ever increasing quantity of high throughput sequencing (HTS) data is stored in FASTQ files. Various methods for data compression are used to mitigate the storage and transmission costs, from the still prevalent general purpose Gzip to state-of-the-art specialized methods. However, all of the existing methods for FASTQ file compression require the decompression stage before the HTS data can be used. This is particularly costly with the random access to specific records in FASTQ files. We propose the sFASTQ format, a succinct representation of FASTQ files that can be used without decompression (i.e., the records can be retrieved and listed online), and that supports random access to individual records. The sFASTQ format can be searched on the disk, which eliminates the need for any additional memory resources. The searchable sFASTQ archive is of comparable size to the corresponding Gzip file. sFASTQ format outputs (interleaved) FASTQ records to the STDOUT stream. We provide SFQ, a software for the construction and usage of the sFASTQ format that supports variable length reads, pairing of records, and both lossless and lossy compression of quality scores.

Funder

Croatian Science Foundation

European Regional Development Fund

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/11/1783/pdf

Reference26 articles.

1. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

2. Comparison of high-throughput sequencing data compression tools

3. SPRING: a next-generation compressor for FASTQ data

4. PgRC: pseudogenome-based read compressor

5. FQSqueezer: k-mer-based compression of sequencing data

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Comprehensive Survey on Knowledge-Defined Networking;Telecom;2023-08-02