Fast classification rates without standard margin assumptions

Author:

Bousquet Olivier1,Zhivotovskiy Nikita2

Affiliation:

1. Google Research, Brain Team, Brandschenkestrasse 110. Zürich 8002, Switzerland

2. Google Research, Brain Team, Zürich, now at Department of Mathematics, ETH, Brandschenkestrasse 110. Zürich 8002, Switzerland

Abstract

Abstract We consider the classical problem of learning rates for classes with finite Vapnik–Chervonenkis (VC) dimension. It is well known that fast learning rates up to $O\left (\frac{d}{n}\right )$ are achievable by the empirical risk minimization (ERM) algorithm if low noise or margin assumptions are satisfied. These usually require the optimal Bayes classifier to be in the class, and it has been shown that when this is not the case, the fast rates cannot be achieved even in the noise free case. In this paper, we further investigate the question of the fast rates in the agnostic setting, when the Bayes classifier is not in the class and the noise in the labeling is allowed. First, we consider classification with a reject option, namely Chow’s reject option model, and show that by slightly lowering the impact of hard instances, a learning rate of order $O\left (\frac{d}{n}\log \frac{n}{d}\right )$ is always achievable in the agnostic setting by a specific learning algorithm. Similar results were only known under special versions of margin assumptions. As an auxiliary result, we show that under the Bernstein assumption the performance of the proposed algorithm is never worse than the performance of ERM even if some of the labels are predicted at random. Based on those results, we derive the necessary and sufficient conditions for classification (without a reject option) with fast rates in the agnostic setting achievable by improper learners. This simultaneously extends the work by Massart & Nédélec (2006, Ann. Stat., 34, 2326–2366), which studied this question in the case where the Bayesian optimal rule belongs to the class, and the work by Ben-David and Urner (2014, COLT, pp. 527–542 ), which allows the misspecification but is limited to the no noise setting. Our result also provides the first general setup in statistical learning theory in which an improper learning algorithm may significantly improve the learning rate for non-convex losses.

Publisher

Oxford University Press (OUP)

Subject

Applied Mathematics,Computational Theory and Mathematics,Numerical Analysis,Statistics and Probability,Analysis

Reference52 articles.

1. Progressive mixture rules are deviation suboptimal;Audibert,2008

2. Fast learning rates in statistical inference through aggregation;Audibert;Ann. Statist.,2009

3. Convexity, classification, and risk bounds;Bartlett;J. Am. Stat. Assoc.,2006

4. Empirical minimization;Bartlett;Probab. Theory Related Fields,2006

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3