On the predictive power of meta-features in OpenML

Author:

Bilalli Besim1,Abelló Alberto1,Aluja-Banet Tomàs2

Affiliation:

1. Department of Service and Information System Engineering Polytechnic University of Catalonia (BarcelonaTech), Campus Nord, Jordi Girona, 1–3, 08034, Barcelona , Spain

2. Department of Statistics and Operations Research Polytechnic University of Catalonia (BarcelonaTech), Campus Nord, Jordi Girona, 1–3, 08034, Barcelona , Spain

Abstract

Abstract The demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., nonexperienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.

Publisher

Walter de Gruyter GmbH

Subject

Applied Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Reference29 articles.

1. Bensusan, H. and Giraud-Carrier, C. (2000a). Casa batl´o is in passeig de gr´acia or how landmark performances can describe tasks, Proceedings of the ECML-00 Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, Barcelona, Spain, pp. 29-46.

2. Bensusan, H. and Giraud-Carrier, C.G. (2000b). Discovering task neighbourhoods through landmark learning performances, Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Lyon, France, pp. 325-330.

3. Bensusan, H., Giraud-Carrier, C.G. and Kennedy, C.J. (2000). A higher-order approach to meta-learning, Proceedings of the International Conference on Inductive Logic Programming, London, UK.

4. Bensusan, H. and Kalousis, A. (2001). Estimating the predictive accuracy of a classifier, in M. Bramer (Ed.), Principles of Data Mining. Undergraduate Topics in Computer Science, Springer, London, pp. 25-36.

5. Bilalli, B., Abelló, A., Aluja-Banet, T. and Wrembel, R. (2016). Towards intelligent data analysis: The metadata challenge, Proceedings of the International Conference on Internet of Things and Big Data, Rome, Italy, pp. 331-338.

Cited by 30 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Automated Machine Learning Configuration to Learn Intrusion Detectors on Attack-Free Datasets;2024 IEEE 49th Conference on Local Computer Networks (LCN);2024-10-08

2. Unlocking Automated Machine Learning Efficiency: Meta-Learning Dynamics in Social Sciences for Education and Business Data;TEM Journal;2024-02-27

3. A review on preprocessing algorithm selection with meta-learning;Knowledge and Information Systems;2023-08-28

4. Taming the Diversity of Computational Notebooks;Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A;2023-08-28

5. Eight years of AutoML: categorisation, review and trends;Knowledge and Information Systems;2023-08-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3