Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers-Reference-Cited by-同舟云学术

Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers

Published:2022-06-16 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Peng Xiyu^ORCID,Dorman Karin S^ORCID

Abstract

AbstractMotivationAmplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during Polymerase Chain Reaction (PCR) and sequencing. One solution attaches Unique Molecular Identifiers (UMIs) to sample sequences before amplification eliminating amplification bias by clustering reads on UMI and counting clusters to quantify abundance. While modern methods improve over naïve clustering by UMI identity, most do not account for UMI reuse, or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences.ResultsWe introduce Deduplication and accurate Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological sequences and accurately estimate their deduplicated abundance from amplicon sequence data. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods.AvailabilitySource code is available at https://github.com/xiyupeng/AmpliCI-UMI.

Publisher

Cold Spring Harbor Laboratory

Reference68 articles.

1. Amir, A. , McDonald, D. , Navas-Molina, J.A. , Kopylova, E. , Morton, J.T. , Zech Xu, Z. , Kightley, E.P. , Thompson, L.R. , Hyde, E.R. , Gonzalez, A. , Knight, R. : Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems 2(2) (2017)

2. Generalized double Pareto shrinkage;Statistica Sinica,2013

3. Viral variants that initiate and drive maturation of V1V2-directed HIV-1 broadly neutralizing antibodies

4. Beyond genome sequencing: Lineage tracking with barcodes to study the dynamics of evolution, infection, and cancer

5. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis;The ISME Journal,2017