Author:
Tischler German,Myers Eugene W.
Abstract
AbstractWhile second generation sequencing led to a vast increase in sequenced data, the shorter reads which came with it made assembly a much harder task and for some regions impossible with only short read data. This changed again with the advent of third generation long read sequencers. The length of the long reads allows a much better resolution of repetitive regions, their high error rate however is a major challenge. Using the data successfully requires to remove most of the sequencing errors. The first hybrid correction methods used low noise second generation data to correct third generation data, but this approach has issues when it is unclear where to place the short reads due to repeats and also because second generation sequencers fail to sequence some regions which third generation sequencers work on. Later non hybrid methods appeared. We present a new method for non hybrid long read error correction based on De Bruijn graph assembly of short windows of long reads with subsequent combination of these correct windows to corrected long reads. Our experiments show that this method yields a better correction than other state of the art non hybrid correction approaches.
Publisher
Cold Spring Harbor Laboratory
Reference28 articles.
1. Paci cBiosciences DevNet E. coli long read data. https://github.com/PacificBiosciences/DevNet/wiki/E-coli-Bacterial-Assembly.
2. Paci cBiosciencesDevNetSaccharomycescerevisiaelongread data. https://github.com/PacificBiosciences/DevNet/wiki/Saccharomyces-cerevisiae-W303-Assembly-Contigs.
3. The genome sequence of drosophila melanogaster;Science,2000
4. Perceptions and Experiences of Research Participants on Gender-Based Violence Community Based Survey: Implications for Ethical Guidelines
5. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献