Abstract
AbstractThe characterization of human gene expression is limited by short read lengths, high error rates and large input requirements. Here, we used a synthetic long read (SLR) sequencing approach, LoopSeq, to generate accurate sequencing reads that span full length transcripts using standard short read data. LoopSeq identified isoforms from control samples with 99.4% accuracy and a 0.01% per-base error rate, exceeding the accuracy reported for other long-read technologies. Applied to targeted transcriptome sequencing from colon cancers and their metastatic counterparts, LoopSeq revealed large scale isoform redistributions from benign colon mucosa to primary colon cancer and metastatic cancer and identified several previously unknown fusion isoforms. Strikingly, single nucleotide variants (SNVs) occurred dominantly in specific isoforms and some SNVs underwent isoform switching in cancer progression. The ability to use short reads to generate accurate long-read data as the raw unit of information holds promise as a widely accessible approach in transcriptome sequencing.
Funder
U.S. Department of Health & Human Services | NIH | National Cancer Institute
United States Department of Defense | United States Army | Army Medical Command | Congressionally Directed Medical Research Programs
Publisher
Springer Science and Business Media LLC
Subject
General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,Medicine (miscellaneous)
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献