Author:
Kang Ji-Nam,Hur Mok,Kim Chang-Kug,Yang So-Hee,Lee Si-Myung
Abstract
Astragalus membranaceus is a medicinal plant mainly used in East Asia and contains abundant secondary metabolites. Despite the importance of this plant, the available genomic and genetic information is still limited. De novo transcriptome construction is recognized as an essential method for transcriptome research when reference genome information is incomplete. In this study, we constructed three individual transcriptome sets (unigene sets) for detailed analysis of the phenylpropanoid biosynthesis pathway, a major metabolite of A. membranaceus. Set-1 was a circular consensus sequence (CCS) generated using PacBio sequencing (PacBio-seq). Set-2 consisted of hybridized assembled unigenes with Illumina sequencing (Illumina-seq) reads and PacBio CCS using rnaSPAdes. Set-3 unigenes were assembled from Illumina-seq reads using the Trinity software. Construction of multiple unigene sets provides several advantages for transcriptome analysis. First, it provides an appropriate expression filtering threshold for assembly-based unigenes: a threshold transcripts per million (TPM) ≥ 5 removed more than 88% of assembly-based unigenes, which were mostly short and low-expressing unigenes. Second, assembly-based unigenes compensated for the incomplete length of PacBio CCSs: the ends of the 5`/3` untranslated regions of phenylpropanoid-related unigenes derived from set-1 were incomplete, which suggests that PacBio CCSs are unlikely to be full-length transcripts. Third, more isoform unigenes could be obtained from multiple unigene sets; isoform unigenes missing in Set-1 were detected in set-2 and set-3. Finally, gene ontology and Kyoto Encyclopedia of Genes and Genomes analyses showed that phenylpropanoid biosynthesis and carbohydrate metabolism were highly activated in A. membranaceus roots. Various sequencing technologies and assemblers have been developed for de novo transcriptome analysis. However, no technique is perfect for de novo transcriptome analysis, suggesting the need to construct multiple unigene sets. This method enables efficient transcript filtering and detection of longer and more diverse transcripts.