Abstract
AbstractGenes in SARS-CoV-2 and, more generally, in viruses in the order of Nidovirales are expressed by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. This process is distinct from alternative splicing in eukaryotes, rendering current transcript assembly methods unsuitable to Nidovirales sequencing samples. Here, we introduce the Discontinuous Transcript Assembly problem of finding transcripts and their abundances c given an alignment under a maximum likelihood model that accounts for varying transcript lengths. Underpinning our approach is the concept of a segment graph, a directed acyclic graph that, distinct from the splice graph used to characterize alternative splicing, has a unique Hamiltonian path. We provide a compact characterization of solutions as subsets of non-overlapping edges in this graph, enabling the formulation of an efficient mixed integer linear program. We show using simulations that our method, Jumper, drastically outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1 and SARS-CoV-2 samples, we find that Jumper not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are well supported by direct evidence from long-read data, presence in multiple, independent samples or a conserved core sequence. Jumper enables detailed analyses of Nidovirales transcriptomes.Code availabilitySoftware is available at https://github.com/elkebir-group/Jumper
Publisher
Cold Spring Harbor Laboratory
Reference48 articles.
1. The Genome Organization of the Nidovirales: Similarities and Differences between Arteri-, Toro-, and Coronaviruses
2. Helena Jane Maier , Erica Bickerton , Paul Britton , et al. Coronaviruses: methods and protocols. Springer Berlin, 2015.
3. Dongwan Kim , Joo-Yeon Lee , Jeong-Sun Yang , Jun Won Kim , V Narry Kim , and Hyeshik Chang . The architecture of SARS-CoV-2 transcriptome. Cell, 2020.
4. De novo assembly and analysis of RNA-seq data;Nature Methods,2010
5. Full-length transcriptome assembly from RNA-Seq data without a reference genome
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献