PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins-Reference-Cited by-同舟云学术

PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins

Published:2023-12-18 Issue:1 Volume:24 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Marrama Daniel,Chronister William D.,Westernberg Luise,Vita Randi,Koşaloğlu-Yalçın Zeynep,Sette Alessandro,Nielsen Morten,Greenbaum Jason A.,Peters Bjoern

Abstract

Abstract Background Numerous tools exist for biological sequence comparisons and search. One case of particular interest for immunologists is finding matches for linear peptide T cell epitopes, typically between 8 and 15 residues in length, in a large set of protein sequences. Both to find exact matches or matches that account for residue substitutions. The utility of such tools is critical in applications ranging from identifying conservation across viral epitopes, identifying putative epitope targets for allergens, and finding matches for cancer-associated neoepitopes to examine the role of tolerance in tumor recognition. Results We defined a set of benchmarks that reflect the different practical applications of short peptide sequence matching. We evaluated a suite of existing methods for speed and recall and developed a new tool, PEPMatch. The tool uses a deterministic k-mer mapping algorithm that preprocesses proteomes before searching, achieving a 50-fold increase in speed over methods such as the Basic Local Alignment Search Tool (BLAST) without compromising recall. PEPMatch’s code and benchmark datasets are publicly available. Conclusions PEPMatch offers significant speed and recall advantages for peptide sequence matching. While it is of immediate utility for immunologists, the developed benchmarking framework also provides a standard against which future tools can be evaluated for improvements. The tool is available at https://nextgen-tools.iedb.org, and the source code can be found at https://github.com/IEDB/PEPMatch.

Funder

National Institutes of Health

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-023-05606-4.pdf

Reference20 articles.

1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. https://doi.org/10.1016/S0022-2836(05)80360-2.

2. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 2004;5:113. https://doi.org/10.1186/1471-2105-5-113.

3. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using diamond. Nat Methods. 2015;12:59–60. https://doi.org/10.1038/nmeth.3176.

4. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35:1026–8. https://doi.org/10.1038/nbt.3988.

5. Trolle T, McMurtrey CP, Sidney J, Bardet W, Osborn SC, Kaever T, Sette A, Hildebrand WH, Nielsen M, Peters B. The length distribution of class I restricted T cell epitopes is determined by both peptide supply and MHC allele specific binding preference. J Immunol Baltim Md. 1950;2016(196):1480–7. https://doi.org/10.4049/jimmunol.1501721.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Nonclinical immunogenicity risk assessment for knobs-into-holes bispecific IgG ₁ antibodies;mAbs;2024-06-06

2. Next-generation IEDB tools: a platform for epitope prediction and analysis;Nucleic Acids Research;2024-05-23

3. Variable domain mutational analysis to probe the molecular mechanisms of high viscosity of an IgG ₁ antibody;mAbs;2024-01-25