Optimization of de novo transcriptome assembly from next-generation sequencing data-Reference-Cited by-同舟云学术

Optimization of de novo transcriptome assembly from next-generation sequencing data

Published:2010-08-06 Issue:10 Volume:20 Page:1432-1440
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Surget-Groba Yann,Montoya-Burgos Juan I.

Abstract

Transcriptome analysis has important applications in many biological fields. However, assembling a transcriptome without a known reference remains a challenging task requiring algorithmic improvements. We present two methods for substantially improving transcriptome de novo assembly. The first method relies on the observation that the use of a single k-mer length by current de novo assemblers is suboptimal to assemble transcriptomes where the sequence coverage of transcripts is highly heterogeneous. We present the Multiple-k method in which various k-mer lengths are used for de novo transcriptome assembly. We demonstrate its good performance by assembling de novo a published next-generation transcriptome sequence data set of Aedes aegypti, using the existing genome to check the accuracy of our method. The second method relies on the use of a reference proteome to improve the de novo assembly. We developed the Scaffolding using Translation Mapping (STM) method that uses mapping against the closest available reference proteome for scaffolding contigs that map onto the same protein. In a controlled experiment using simulated data, we show that the STM method considerably improves the assembly, with few errors. We applied these two methods to assemble the transcriptome of the non-model catfish Loricaria gr. cataphracta. Using the Multiple-k and STM methods, the assembly increases in contiguity and in gene identification, showing that our methods clearly improve quality and can be widely used. The new methods were used to assemble successfully the transcripts of the core set of genes regulating tooth development in vertebrates, while classic de novo assembly failed.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference48 articles.

1. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

2. SNP discovery via 454 transcriptome sequencing

3. Phylogenomics reveals a new ‘megagroup’ including most photosynthetic eukaryotes

4. ALLPATHS: De novo assembly of whole-genome shotgun microreads

5. Hunting hidden transcripts

Cited by 288 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hepatic transcriptome profiling of largemouth bass (Micropterus salmoides Lacépède) injected with Flavobacterium covae or lipopolysaccharide;Journal of Fish Diseases;2024-04

2. Roast: a tool for reference-free optimization of supertranscriptome assemblies;BMC Bioinformatics;2024-01-02

3. TransIntegrator: capture nearly full protein-coding transcript variants via integrating Illumina and PacBio transcriptomes;Briefings in Bioinformatics;2023-09-22

4. Metabolomics and transcriptomics analyses for characterizing the alkaloid metabolism of Chinese jujube and sour jujube fruits;Frontiers in Plant Science;2023-09-18

5. Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis;Scientific Reports;2023-07-31