Exploring an experiment-split method to estimate the generalization ability in new data: DeepKme as an example-Reference-Cited by-同舟云学术

Exploring an experiment-split method to estimate the generalization ability in new data: DeepKme as an example

Published:2021-03-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Zou Guoyang^ORCID,Li Lei

Abstract

AbstractA Large number of predictors have been built based on different data sets for predicting different post-translational modification sites. However, limited to our knowledge, most of them gave an overfitting estimation of their generalization ability in new data because of the intrinsic trait—not considering the experimental sources of the new data—of the cross-validation method. Thus, we proposed and explored a new method—the experiment-split method—imitating the blinded assessment to deal with the overfitting problem in the new data. The experiment-split method logically split the training and test data based on the data’s different experimental sources, and the new data can be regarded as the data from different experimental sources. To specifically illustrate the experiment-split method, we combined an actual application, DeepKme—a predictor built by us for the lysine methylation sites, to demonstrate how it be used in the true scenarios. We compared the cross-validation method with the experiment-split method. The result suggested the experiment-split method could effectively relieve the overfitting compared with the cross-validation method and may be widely used in the field of identification participated by multiple experiments. We believe DeepKme would facilitate the related researchers’ deep thought of the experiment-split method and the overfitting phenomenon, and of course, advance the study of the lysine methylation and similar fields.

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. Introduction: Posttranslational Protein Modification

2. The winding path of protein methylation research: milestones and new frontiers

3. The functional diversity of protein lysine methylation

4. Chemical and Biochemical Perspectives of Protein Lysine Methylation

5. Lysine methylation of transcription factors in cancer