Affiliation:
1. Center for Applied Mathematics,Tianjin University, China
Abstract
”Easy/hard sample” is a popular parlance in machine learning. Learning difficulty of samples refers to how easy/hard a sample is during a learning procedure. An increasing need of measuring learning difficulty demonstrates its importance in machine learning (e.g., difficulty-based weighting learning strategies). Previous literature has proposed a number of learning difficulty measures. However, no comprehensive investigation for learning difficulty is available to date, resulting in that nearly all existing measures are heuristically defined without a rigorous theoretical foundation. This study attempts to conduct a pilot theoretical study for learning difficulty of samples. First, influential factors for learning difficulty are summarized. Under various situations conducted by summarized influential factors, correlations between learning difficulty and two vital criteria of machine learning, namely generalization error and model complexity, are revealed. Second, a theoretical definition of learning difficulty is proposed on the basis of these two criteria. A practical measure of learning difficulty is proposed under the direction of the theoretical definition by importing the bias-variance trade-off theory. Subsequently, the rationality of theoretical definition and the practical measure is respectively demonstrated by analysis of several classical weighting methods and abundant experiments realized under all situations conducted by summarized influential factors. The mentioned weighting methods can be reasonable explained under proposed theoretical definition and concerned propositions. The comparison in these experiments indicates that the proposed measure significantly outperforms the other measures throughout the experiments.
Publisher
Association for Computing Machinery (ACM)