De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm-Reference-Cited by-同舟云学术

De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm

Published:2018-11-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Sahlin Kristoffer^ORCID,Medvedev Paul

Abstract

AbstractLong-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ksahlin/isONclust.

Publisher

Cold Spring Harbor Laboratory

Reference46 articles.

1. Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells;Nature Communications,2017

2. Deciphering highly similar multigene family transcripts from iso-seq data with isocon;Nature Communications,2018

3. Altered expression of the fmr1 splicing variants landscape in premutation carriers;Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms,2017

4. Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line

5. Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Molecular phylogeny of fucoxanthin-chlorophyll a/c proteins from Chaetoceros gracilis and Lhcq/Lhcf diversity;2021-09-07

2. Developmental validation of Oxford Nanopore Technology MinION sequence data and the NGSpeciesID bioinformatic pipeline for forensic genetic species identification;Forensic Science International: Genetics;2021-07

3. Representation of k-Mer Sets Using Spectrum-Preserving String Sets;Journal of Computational Biology;2021-04-01

4. Methodologies for Transcript Profiling Using Long-Read Technologies;Frontiers in Genetics;2020-07-07

5. The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read tools;2020-06-29