Compression of Next-Generation Sequencing Data and of DNA Digital Files-Reference-Cited by-同舟云学术

Compression of Next-Generation Sequencing Data and of DNA Digital Files

Published:2020-06-24 Issue:6 Volume:13 Page:151
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Carpentieri Bruno^ORCID

Abstract

The increase in memory and in network traffic used and caused by new sequenced biological data has recently deeply grown. Genomic projects such as HapMap and 1000 Genomes have contributed to the very large rise of databases and network traffic related to genomic data and to the development of new efficient technologies. The large-scale sequencing of samples of DNA has brought new attention and produced new research, and thus the interest in the scientific community for genomic data has greatly increased. In a very short time, researchers have developed hardware tools, analysis software, algorithms, private databases, and infrastructures to support the research in genomics. In this paper, we analyze different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences, and we discuss and evaluate the compression performance of generic compression algorithms by confronting them with a specific system designed by Jones et al. specifically for genomic file compression: Quip. Moreover, we present a simple but effective technique for the compression of DNA sequences in which we only consider the relevant DNA data and experimentally evaluate its performances.

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/13/6/151/pdf

Reference29 articles.

1. 1000 Genomes: A Deep Catalog of Human Genetic Variationhttps://www.internationalgenome.org/

2. Challenges in funding and developing genomic software: roots and remedies

3. Genomic Data Compression

4. Next Generation Sequencing Data and its Compression;Carpentieri

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Universal Non-parametric Approach for Improved Molecular Sequence Analysis;Lecture Notes in Computer Science;2024

2. DNA Data Encoding and Compression Using Image Compression Algorithms;Lecture Notes in Networks and Systems;2024

3. An intelligent ubiquitous compression technique for DNA sequencing using Hadoop;The Journal of Engineering;2022-09-07

4. 2020 Selected Papers from Algorithms’ Editorial Board Members;Algorithms;2021-01-21