Ariadne: Barcoded Linked-Read Deconvolution Using de Bruijn Graphs
Mak Lauren, Meleshko Dmitry, Danko David C., Barakzai Waris N., Belchikov Natan, Hajirasouliha ImanORCID
AbstractDe novo assemblies are critical for capturing the genetic composition of complex samples. Linked-read sequencing techniques such as 10x Genomics’ Linked-Reads, UST’s TELL-Seq, Loop Genomics’ LoopSeq, and BGI’s Long Fragment Read combines 3′ barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. The application of linked-read sequencing to genome assembly has demonstrated that barcoding-based technologies balance the tradeoffs between long-range linkage, per-base coverage, and costs. Linked-reads come with their own challenges, chief among them the association of multiple long fragments with the same 3′ barcode. The lack of a unique correspondence between a long fragment and a barcode, in conjunction with low sequencing depth, confounds the assignment of linkage between short-reads.ResultsWe introduce Ariadne, a novel linked-read deconvolution algorithm based on assembly graphs, that can be used to extract single-species read-sets from a large linked-read dataset. Ariadne deconvolution of linked-read clouds increases the proportion of read clouds containing only reads from a single fragment by up to 37.5-fold. Using these enhanced read clouds in de novo assembly significantly improves assembly contiguity and the size of the largest aligned blocks in comparison to the non-deconvolved read clouds. Integrating barcode deconvolution tools, such as Ariadne, into the postprocessing pipeline for linked-read technologies increases the quality of de novo assembly for complex populations, such as microbiomes. Ariadne is intuitive, computationally efficient, and scalable to other large-scale linked-read problems, such as human genome phasing.AvailabilityThe source code is available on GitHub:
Cold Spring Harbor Laboratory
Reference29 articles.
1. New approaches for metagenome assembly with short reads;Briefings in Bioinformatics,2019 2. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing 3. High-quality genome sequences of uncultured microbes by assembly of read clouds;Nature Biotechnology,2018 4. Brown, C.L. , Keenum, I.M. , Dai, D. , Zhang, L. , Vikesland, P.J. , Pruden, A. : Critical evaluation of short, long, and hybrid assembly for contextual analysis of antibiotic resistance genes in complex environmental metagenomes. Scientific Reports 11(1) (2021). 5. Chen, Z. , Pham, L. , Wu, T.C. , Mo, G. , Xia, Y. , Chang, P. , Porter, D. , Phan, T. , Che, H. , Tran, H. , Bansal, V. , Shaffer, J. , Belda-Ferre, P. , Humphrey, G. , Knight, R. , Pevzner, P. , Pham, S. , Wang, Y. , Lei, M. : Ultra-low input single tube linked-read library method enables short-read ngs systems to generate highly accurate and economical long-range sequencing information for de novo genome assembly and haplotype phasing. bioRxiv p. 852947 (01 2019)