Author:
Alves Gelio,Ogurtsov Aleksey Y,Yu Yi-Kuo
Abstract
Abstract
Background
The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides.
Results
Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics,Immunology
Reference29 articles.
1. Eng JK, McCormack AL, Yates JR III: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Amer Soc Mass Spectrom 1994, 5: 976-989. 10.1016/1044-0305(94)80016-2
2. Clauser KR, Baker PR, Burlingame AL: Peptide fragment-ion tags from maldi/psd for error tolerant searching of genomic databases. Proceedings of the 44th ASMS Conference on Mass Spectrometry and Allied Topics: 12–16 May 1996; Portland, Oregan 1996, 365.
3. Bafna V, Edwards N: SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 2001,17(Suppl 1):S13-S21.
4. Johnson RS, Taylor JA: Searching sequence databases via de novo peptide sequencing by tandem mass spectrometry. Mol Biotech 2002, 22: 301-315. 10.1385/MB:22:3:301
5. Hernandez P, Gras R, Frey J, Appel RD: Popitam: Towards new heuristic strategies to improve protein identification from tandem mass spectrometry data. Proteomics 2003, 3: 870-878. 10.1002/pmic.200300402
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献