Full-length PacBio Amplicon Sequencing to Unveil RNA Editing Sites
-
Published:2023-08-03
Issue:
Volume:18
Page:
-
ISSN:1574-8936
-
Container-title:Current Bioinformatics
-
language:en
-
Short-container-title:CBIO
Author:
Zhu Xiao-Lu1ORCID,
Liao Ming-Ling1ORCID,
Zhu Ya-Jie1ORCID,
Dong Yun-Wei1ORCID
Affiliation:
1. Key Laboratory of Mariculture of Ministry of Education, College of Fisheries, Ocean University of China, Qingdao 266003, China
Abstract
Background:
RNA editing enriches post-transcriptional sequence changes. Currently detecting RNA editing sites is mostly based on the Sanger sequencing platform and second-generation sequencing. However, detection with Sanger sequencing is limited by the disturbing background peaks using the direct sequencing method and the clone number using the clone sequencing method, while second-generation sequencing detection is constrained by its short read.
Objective:
We aimed to design a pipeline that can accurately detect RNA editing sites for full-length long-read amplicons to meet the requirement when focusing on a few specific genes of interest.
Method:
We developed a novel high-throughput RNA editing sites detection pipeline based on the PacBio circular consensus sequences sequencing which is accurate with high-throughput and long-read coverage. We tested the pipeline on cytosolic malate dehydrogenase in the hard-shelled mussel Mytilus coruscus and further validated it using direct Sanger sequencing.
Results:
Data generated from the PacBio circular consensus sequences (CCS) amplicons in three mussels were first filtered by quality and then selected by open reading frame. After filtering, 225-2047 sequences of the three mussels, respectively, were used to identify RNA editing sites. With corresponding genomic DNA sequences, we extracted 227-799 candidate RNA editing sites excluding heterozygous sites. We further figured out 7-11 final RESs using a new error model specially designed for RNA editing site detection. The resulting RNA editing sites all agree with the validation using the Sanger sequencing.
Conclusion:
We report a near-zero error rate method in identifying RNA editing sites of long-read amplicons with the use of PacBio CCS sequencing.
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry