Merging short and stranded long reads improves transcript assembly-Reference-Cited by-同舟云学术

Merging short and stranded long reads improves transcript assembly

Published:2023-10-26 Issue:10 Volume:19 Page:e1011576
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Kainth Amoldeep S.^ORCID,Haddad Gabriela A.,Hall Johnathon M.^ORCID,Ruthenburg Alexander J.

Abstract

Long-read RNA sequencing has arisen as a counterpart to short-read sequencing, with the potential to capture full-length isoforms, albeit at the cost of lower depth. Yet this potential is not fully realized due to inherent limitations of current long-read assembly methods and underdeveloped approaches to integrate short-read data. Here, we critically compare the existing methods and develop a new integrative approach to characterize a particularly challenging pool of low-abundance long noncoding RNA (lncRNA) transcripts from short- and long-read sequencing in two distinct cell lines. Our analysis reveals severe limitations in each of the sequencing platforms. For short-read assemblies, coverage declines at transcript termini resulting in ambiguous ends, and uneven low coverage results in segmentation of a single transcript into multiple transcripts. Conversely, long-read sequencing libraries lack depth and strand-of-origin information in cDNA-based methods, culminating in erroneous assembly and quantitation of transcripts. We also discover a cDNA synthesis artifact in long-read datasets that markedly impacts the identity and quantitation of assembled transcripts. Towards remediating these problems, we develop a computational pipeline to “strand” long-read cDNA libraries that rectifies inaccurate mapping and assembly of long-read transcripts. Leveraging the strengths of each platform and our computational stranding, we also present and benchmark a hybrid assembly approach that drastically increases the sensitivity and accuracy of full-length transcript assembly on the correct strand and improves detection of biological features of the transcriptome. When applied to a challenging set of under-annotated and cell-type variable lncRNA, our method resolves the segmentation problem of short-read sequencing and the depth problem of long-read sequencing, resulting in the assembly of coherent transcripts with precise 5’ and 3’ ends. Our workflow can be applied to existing datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.

Funder

National Institute of General Medical Sciences

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference103 articles.

1. RNA-Seq: a revolutionary tool for transcriptomics;Z Wang;Nat Rev Genet,2009

2. Coming of age: ten years of next-generation sequencing technologies;S Goodwin;Nat Rev Genet,2016

3. Analysis of error profiles in deep next-generation sequencing data;X Ma;Genome Biol,2019

4. RNA sequencing: the teenage years;R Stark;Nat Rev Genet,2019

5. Systematic evaluation of spliced alignment programs for RNA-seq data.;PG Engstrom;Nat Methods.,2013

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Environmental community transcriptomics: strategies and struggles;Briefings in Functional Genomics;2024-08-24

2. LsRTDv1, a reference transcript dataset for accurate transcript‐specific expression analysis in lettuce;The Plant Journal;2024-08-15

3. Full-Length Transcriptome Assembly of Platycladus orientalis Root Integrated with RNA-Seq to Identify Genes in Response to Root Pruning;Forests;2024-07-15

4. Long‐read RNA‐Seq for the discovery of long noncoding and antisense RNAs in plant organelles;Physiologia Plantarum;2024-07

5. HyDRA: a pipeline for integrating long- and short-read RNAseq data for custom transcriptome assembly;2024-06-27