Automatic Optimization of Deep Learning Training through Feature-Aware-Based Dataset Splitting

Author:

Shahrabadi Somayeh12,Adão Telmo23ORCID,Peres Emanuel345ORCID,Morais Raul345ORCID,Magalhães Luís G.2,Alves Victor2ORCID

Affiliation:

1. Centro de Computação Gráfica—CCG/zgdv, University of Minho, Campus de Azurém, Edifício 14, 4800-058 Guimarães, Portugal

2. ALGORITMI Research Centre/LASI, University of Minho, 4710-057 Guimarães, Portugal

3. Department of Engineering, School of Sciences and Technology, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal

4. Centre for the Research and Technology of Agro-Environmental and Biological Sciences, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal

5. Institute for Innovation, Capacity Building and Sustainability of Agri-Food Production, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal

Abstract

The proliferation of classification-capable artificial intelligence (AI) across a wide range of domains (e.g., agriculture, construction, etc.) has been allowed to optimize and complement several tasks, typically operationalized by humans. The computational training that allows providing such support is frequently hindered by various challenges related to datasets, including the scarcity of examples and imbalanced class distributions, which have detrimental effects on the production of accurate models. For a proper approach to these challenges, strategies smarter than the traditional brute force-based K-fold cross-validation or the naivety of hold-out are required, with the following main goals in mind: (1) carrying out one-shot, close-to-optimal data arrangements, accelerating conventional training optimization; and (2) aiming at maximizing the capacity of inference models to its fullest extent while relieving computational burden. To that end, in this paper, two image-based feature-aware dataset splitting approaches are proposed, hypothesizing a contribution towards attaining classification models that are closer to their full inference potential. Both rely on strategic image harvesting: while one of them hinges on weighted random selection out of a feature-based clusters set, the other involves a balanced picking process from a sorted list that stores data features’ distances to the centroid of a whole feature space. Comparative tests on datasets related to grapevine leaves phenotyping and bridge defects showcase promising results, highlighting a viable alternative to K-fold cross-validation and hold-out methods.

Funder

RRP—Recovery and Resilience Plan

European Next Generation EU Funds

FCT-Portuguese Foundation for Science and Technology

Publisher

MDPI AG

Reference55 articles.

1. Benchmarking Deep Learning Models and Hyperparameters for Bridge Defects Classification;Shahrabadi;Procedia Comput. Sci.,2023

2. Digital Ampelographer: A CNN Based Preliminary Approach;Pinho;Proceedings of the EPIA Conference on Artificial Intelligence,2019

3. Shahrabadi, S., Carias, J., Peres, E., Magalhães, L.G., Lopez, M.A.G., Silva, L.B., and Adão, T. (2023, January 8–10). Image-Based Lung Analysis in the Context of Digital Pathology: A Brief Review. Proceedings of the Hcist—International Conference on Health and Social Care Information Systems and Technologies (HCist), Porto, Portugal.

4. Tran, T.-O., Vo, T.H., and Le, N.Q.K. (2023). Omics-Based Deep Learning Approaches for Lung Cancer Decision-Making and Therapeutics Development. Brief. Funct. Genomics, elad031.

5. Yuan, Q., Chen, K., Yu, Y., Le, N.Q.K., and Chua, M.C.H. (2023). Prediction of Anticancer Peptides Based on an Ensemble Model of Deep Learning and Machine Learning Using Ordinal Positional Encoding. Brief. Bioinform., 24.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3