UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy-Reference-Cited by-同舟云学术

UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

Published:2017-01-18 Issue:3 Volume:27 Page:491-499
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Smith Tom^ORCID,Heger Andreas^ORCID,Sudbery Ian^ORCID

Abstract

Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package.

Funder

Medical Research Council

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference33 articles.

1. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries

2. Trimmomatic: a flexible trimmer for Illumina sequence data

3. Scalable microfluidics for single-cell RNA printing and sequencing

4. High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping;BMC Genomics,2015

5. Kraken: A set of tools for quality control and analysis of high-throughput sequence data

Cited by 1459 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cultivation driven transcriptomic changes in the wild-type and mutant strains of Rhodospirillum rubrum;Computational and Structural Biotechnology Journal;2024-12

2. Stratified analyses refine association between TLR7 rare variants and severe COVID-19;Human Genetics and Genomics Advances;2024-10

3. IRescue: uncertainty-aware quantification of transposable elements expression at single cell level;Nucleic Acids Research;2024-09-13

4. Cilia defects upon loss of WDR4 are linked to proteasomal hyperactivity and ubiquitin shortage;Cell Death & Disease;2024-09-09

5. Deciphering the Cell-Specific Transcript Heterogeneity and Alternative Splicing during the Early Embryonic Development of Zebrafish;2024-09-08