SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read-Reference-Cited by-同舟云学术

SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

Published:2010-01-20 Issue:1 Volume:11 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Falgueras Juan,Lara Antonio J,Fernández-Pozo Noé,Cantón Francisco R,Pérez-Trabado Guillermo,Claros M Gonzalo

Abstract

Abstract Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-11-38.pdf

Reference19 articles.

1. Seluja GA, Farmer A, McLeod M, Harger C, Schad PA: Establishing a method of vector contamination identification in database sequences. Bioinformatics 1999, 15: 106–110. 10.1093/bioinformatics/15.2.106

2. Coker JS, Davies E: Identifying adaptor contamination when mining DNA sequence data. Biotechniques 2004, 37: 194–198.

3. Chen YA, Lin CC, Wang CD, Wu HB, Hwang PI: An optimized procedure greatly improves EST vector contamination removal. BMC Genomics 2007, 8: 416. 10.1186/1471-2164-8-416

4. Scheetz TE, Trivedi N, Roberts CA, Kucaba T, Berger B, Robinson NL, Birkett CL, Gavin AJ, O'Leary B, Braun TA, Bonaldo MF, Robinson JP, Sheffeld VC, Casavant MBSTL: ESTprep: preprocessing cDNA sequence reads. Bioinformatics 2003, 19: 1318–1324. 10.1093/bioinformatics/btg159

5. White JR, Roberts M, Yorke JA, M P: Figaro: a novel statistical method for vector sequence removal. Bioinformatics 2008, 24: 462–467. 10.1093/bioinformatics/btm632

Cited by 151 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review;Foods;2024-07-14

2. CapP mediates the structural formation of biofilm-specific pili in the opportunistic human pathogenBacillus cereus;2024-02-27

3. Multiomics analyses reveal the central role of the nucleolus and its machinery during heat stress acclimation in Pinus radiata;Journal of Experimental Botany;2024-02-06

4. Transcriptomic Insight into the Pollen Tube Growth of Olea europaea L. subsp. europaea Reveals Reprogramming and Pollen-Specific Genes Including New Transcription Factors;Plants;2023-08-08

5. Sporulation Activated via σ ^W Protects Bacillus from a Tse1 Peptidoglycan Hydrolase Type VI Secretion System Effector;Microbiology Spectrum;2023-04-13