Indel-correcting DNA barcodes for high-throughput sequencing-Reference-Cited by-同舟云学术

Indel-correcting DNA barcodes for high-throughput sequencing

Published:2018-06-20 Issue:27 Volume:115 Page:E6217-E6226
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc Natl Acad Sci USA

Author:

Hawkins John A.,Jones Stephen K.,Finkelstein Ilya J.^ORCID,Press William H.

Abstract

Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >106 single-error–correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >1015 error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.

Funder

Welch Foundation

HHS | National Institutes of Health

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Reference30 articles.

1. Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells

2. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets

3. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing

4. Haplotypes drop by drop

5. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications

Cited by 57 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Droplet-based single-cell sequencing: Strategies and applications;Biotechnology Advances;2024-09

2. Cryptographic approaches to authenticating synthetic DNA sequences;Trends in Biotechnology;2024-08

3. Position-dependent function of human sequence-specific transcription factors;Nature;2024-07-17

4. Generating barcodes for nanopore sequencing data with PRO;Fundamental Research;2024-07

5. Functional phenotyping of genomic variants using multiomic scDNA-scRNA-seq;2024-06-01