ParPEST: a pipeline for EST data analysis based on parallel computing-Reference-Cited by-同舟云学术

ParPEST: a pipeline for EST data analysis based on parallel computing

Published:2005-12 Issue:S4 Volume:6 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

D'Agostino Nunzio,Aversano Mario,Chiusano Maria Luisa

Abstract

Abstract Background Expressed Sequence Tags (ESTs) are short and error-prone DNA sequences generated from the 5' and 3' ends of randomly selected cDNA clones. They provide an important resource for comparative and functional genomic studies and, moreover, represent a reliable information for the annotation of genomic sequences. Because of the advances in biotechnologies, ESTs are daily determined in the form of large datasets. Therefore, suitable and efficient bioinformatic approaches are necessary to organize data related information content for further investigations. Results We implemented ParPEST (Par allel P rocessing of EST s), a pipeline based on parallel computing for EST analysis. The results are organized in a suitable data warehouse to provide a starting point to mine expressed sequence datasets. The collected information is useful for investigations on data quality and on data information content, enriched also by a preliminary functional annotation. Conclusion The pipeline presented here has been developed to perform an exhaustive and reliable analysis on EST data and to provide a curated set of information based on a relational database. Moreover, it is designed to reduce execution time of the specific steps required for a complete analysis using distributed processes and parallelized software. It is conceived to run on low requiring hardware components, to fulfill increasing demand, typical of the data used, and scalability at affordable costs.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-6-S4-S9.pdf

Reference34 articles.

1. Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics 2001, 17: 1093–104. 10.1093/bioinformatics/17.12.1093

2. SeqClean a software for vector trimming[http://www.tigr.org/tdb/tgi/software/]

3. PHRAP software[http://www.phrap.org/]

4. RepeatMasker software[http://www.repeatmasker.org/]

5. Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol 2000, 7: 203–214. 10.1089/10665270050081478

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bioinformatics resources for pollen;Plant Reproduction;2016-06

2. Bioinformatics for agriculture in the Next-Generation sequencing era;Chemical and Biological Technologies in Agriculture;2016-04-02

3. Standalone EST microsatellite mining and analysis tool (SEMAT): for automated EST-SSR analysis in plants;Tree Genetics & Genomes;2014-08-12

4. ESTs in Plants: Where Are We Heading?;Agricultural Bioinformatics;2014

5. Towards Positive Unlabeled Learning for Parallel Data Mining: A Random Forest Framework;Advanced Data Mining and Applications;2014