Challenges in identifying mRNA transcript starts and ends from long-read sequencing data-Reference-Cited by-同舟云学术

Challenges in identifying mRNA transcript starts and ends from long-read sequencing data

Published:2023-07-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Calvo-Roitberg Ezequiel^ORCID,Daniels Rachel F.^ORCID,Pai Athma A.^ORCID

Abstract

ABSTRACTLong-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.

Publisher

Cold Spring Harbor Laboratory

Reference41 articles.

1. Defining a personal, allele-specific, and single-molecule long-read transcriptome

2. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing

3. Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes;BMC Genomics,2022

4. Enhanced protein isoform characterization through long-read proteogenomics;Genome Biol,2022

5. Characterization of the human ESC transcriptome by hybrid sequencing

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring the transcriptomic profile of human monkeypox virus via CAGE and native RNA sequencing approaches;mSphere;2024-08-27

2. A robust and flexible baculovirus-insect cell system for AAV vector production with improved yield, capsid ratios and potency;Molecular Therapy - Methods & Clinical Development;2024-06

3. Exploring the Transcriptomic Profile of Human Monkeypox Virus via CAGE and Native RNA Sequencing Approaches;2024-05-01

4. KSHV 3.0: a state-of-the-art annotation of the Kaposi’s sarcoma-associated herpesvirus transcriptome using cross-platform sequencing;mSystems;2024-01-11

5. mRNA initiation and termination are spatially coordinated;2024-01-07