Gauge-Optimal Approximate Learning for Small Data Classification

Author:

Vecchi Edoardo1,Bassetti Davide2,Graziato Fabio3,Pospíšil Lukáš4,Horenko Illia5

Affiliation:

1. Università della Svizzera Italiana, Faculty of Informatics, Institute of Computing, 6962 Lugano, Switzerland edoardo.vecchi@usi.ch

2. Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany bassetti@mathematik.uni-kl.de

3. Independent researcher, 22070 Valmorea, Italy fabio.graziato94@gmail.com

4. VSB Ostrava, Department of Mathematics, Ludvika Podeste 1875/17 708 33 Ostrava, Czech Republic lukas.pospisil@vsb.cz

5. Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany horenko@rptu.de

Abstract

Abstract Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents—under the assumption of a discrete segmentation of the feature space—a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.

Publisher

MIT Press

Reference75 articles.

1. Fokker–Planck dynamics of the El Niño-southern oscillation;An;Scientific Reports,2020

2. On learning rotations;Arora,2009

3. K-means++ the advantages of careful seeding;Arthur,2007

4. Deep learning approach for microarray cancer data classification;Basavegowda;CAAI Transactions on Intelligence Technology,2020

5. Random forests;Breiman;Machine Learning,2001

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3