Abstract
AbstractDinoflagellates are a diverse group of phytoplankton, ranging from harmful bloom-forming microalgae to photosymbionts that are critical for sustaining coral reefs. Genome and transcriptome data from dinoflagellates are revealing extensive genomic divergence and lineage-specific innovation of gene functions. However, most studies thus far have focused on protein-coding genes; long non-coding RNAs (lncRNAs), known to regulate gene expression in eukaryotes, are largely unexplored. Here, using both genome and transcriptome data, we identified a combined total of 48,039 polyadenylated lncRNAs in the genomes of three dinoflagellate species: the coral symbionts ofCladocopium proliferumandDurusdinium trenchii, and the bloom-formingProrocentrum cordatum. These putative lncRNAs are shorter, and have fewer introns and lower G+C-content when compared to protein-coding sequences. Although 37,768 (78.6%) lncRNAs shared no significant similarity with one another, we classified all lncRNAs based on conserved sequence motifs (k-mers) into distinct clusters following properties of potential protein-binding and/or subcellular localisation. Interestingly, 3708 (7.7%) lncRNAs were differentially expressed in response to heat stress, lifestyle, and/or growth phases, and they shared co-expression patterns with protein-coding genes. Based on inferred triplex interactions between lncRNA and upstream (putative promoter) regions of protein-coding genes, we identified a combined 19,460 putative gene targets for 3,721 lncRNAs; 907 genes exhibit differential expression under heat stress. These results reveal for the first time the functional diversity of lncRNAs in dinoflagellates, and demonstrate how lncRNAs, often overlooked in transcriptome data, could regulate gene expression as a molecular response to heat stress in these ecologically important organisms.
Publisher
Cold Spring Harbor Laboratory