Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp-Reference-Cited by-同舟云学术

Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp

Published:2023-05 Issue:2 Volume:2 Page:
ISSN:2770-596X
Container-title:iMeta
language:en
Short-container-title:iMeta

Author:

Chen Shifu¹²^ORCID

Affiliation:

1. HaploX Biotechnology Shenzhen China

2. Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China

Abstract

AbstractA large amount of sequencing data is generated and processed every day with the continuous evolution of sequencing technology and the expansion of sequencing applications. One consequence of such sequencing data explosion is the increasing cost and complexity of data processing. The preprocessing of FASTQ data, which means removing adapter contamination, filtering low‐quality reads, and correcting wrongly represented bases, is an indispensable but resource intensive part of sequencing data analysis. Therefore, although a lot of software applications have been developed to solve this problem, bioinformatics scientists and engineers are still pursuing faster, simpler, and more energy‐efficient software. Several years ago, the author developed fastp, which is an ultrafast all‐in‐one FASTQ data preprocessor with many modern features. This software has been approved by many bioinformatics users and has been continuously maintained and updated. Since the first publication on fastp, it has been greatly improved, making it even faster and more powerful. For instance, the duplication evaluation module has been improved, and a new deduplication module has been added. This study aimed to introduce the new features of fastp and demonstrate how it was designed and implemented.

Funder

Science, Technology and Innovation Commission of Shenzhen Municipality

Publisher

Wiley

Subject

Microbiology,Biotechnology

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/imt2.107

Reference8 articles.

1. Assuring the quality of next-generation sequencing in clinical laboratory practice

2. TNER: a novel background error suppression method for mutation detection in circulating tumor DNA

3. Cutadapt removes adapter sequences from high-throughput sequencing reads