preAssemble: a tool for automatic sequencer trace data processing

Author:

Adzhubei Alexei A,Laerdahl Jon K,Vlasova Anna V

Abstract

Abstract Background Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages – Phred and Staden are used by preAssemble to perform sequence quality processing. Results The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. Conclusion preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Reference7 articles.

1. Ewing B, Hillier LD, Wendl MC, Green P: Base-Calling of Automated Sequencer Traces Using Phred.  I. Accuracy Assessment. Genome Res 1998, 8(3):175–185.

2. Ewing B, Green P: Base-Calling of Automated Sequencer Traces Using Phred.  II. Error Probabilities. Genome Res 1998, 8(3):186–194.

3. Bonfield J, Beal K, Cheng Y, Jordan M, Staden R: Staden Package .1995. [http://staden.sourceforge.net/]

4. Staden R, Beal KF, Bonfield JK: The Staden Package . In Computer Methods in Molecular Biology. Volume 132. Edited by: Misener S, Krawetz S. Totowa, NJ 07512, The Humana Press Inc.; 1998:115 -1130.

5. Adzhubei AA, Laerdahl JK, Vlasova AV, Ruden TA: Norwegian Salmon Genome Project database and web site.2002. [http://www.salmongenome.no]

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3