Estimating error rates for single molecule protein sequencing experiments-Reference-Cited by-同舟云学术

Estimating error rates for single molecule protein sequencing experiments

Published:2023-07-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Smith Matthew Beauregard^ORCID,VanderVelden Kent,Blom Thomas,Stout Heather D.^ORCID,Mapes James H.,Folsom Tucker M.^ORCID,Martin Christopher,Bardo Angela M.^ORCID,Marcotte Edward M.^ORCID

Abstract

AbstractThe practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extendswhatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently, which should help prevent overfitting. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell’s method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell’s method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.

Publisher

Cold Spring Harbor Laboratory

Reference25 articles.

1. Protein Sequencing, One Molecule at a Time

2. The emerging landscape of single-molecule protein sequencing technologies

3. Paving the way to single-molecule protein sequencing

4. Leveraging nature’s biomolecular designs in next-generation protein sequencing reagent development

5. Label-Free Optical Analysis of Biomolecules in Solid-State Nanopores: Toward Single-Molecule Protein Sequencing

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Review and Computational Study on Practicality of Derivative-Free DIRECT-Type Methods;Informatica;2024

2. Robust and scalable single-molecule protein sequencing with fluorosequencing;2023-09-16