Structure discovery in PAC-learning by random projections

Author:

Kabán AtaORCID,Reeve Henry

Abstract

AbstractHigh dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.

Funder

Engineering and Physical Sciences Research Council

Publisher

Springer Science and Business Media LLC

Reference41 articles.

1. Alon, N., Ben-David, S., Cesa-Bianchi, N., & Haussler, D. (1997). Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 4, 615–631.

2. Arriaga, R. I., & Vempala, S. (1999). An algorithmic theory of learning: Robust concepts and random projection. In 40th Annual Symposium on Foundations of Computer Science (FOCS) (pp. 616–623).

3. Bartl, D., & Mendelson, S. (2022). Random embeddings with an almost Gaussian distortion. Advances in Mathematics, 400, 108261.

4. Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.

5. Crammer, K., Gilad-Bachrach, R., Navot, A., & Tishby, N. (2002). Margin analysis of the LVQ algorithm. In Neural information processing systems (NIPS).

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Efficient learning with projected histograms;Data Mining and Knowledge Discovery;2024-09-01

2. Compressive Mahalanobis Metric Learning Adapts to Intrinsic Dimension;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3