Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data-Reference-Cited by-同舟云学术

Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data

Published:2024-02-29 Issue:1 Volume:5 Page:384-416
ISSN:2673-4117
Container-title:Eng
language:en
Short-container-title:Eng

Author:

Kampezidou Styliani I.¹^ORCID,Tikayat Ray Archana²^ORCID,Bhat Anirudh Prabhakara³^ORCID,Pinon Fischer Olivia J.¹^ORCID,Mavris Dimitri N.¹^ORCID

Affiliation:

1. Aerospace Systems Design Laboratory, School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

2. AI Fusion Technologies, Toronto, ON M5V 3Z5, Canada

3. Amazon, Toronto, ON M5H 4A9, Canada

Abstract

This paper offers a comprehensive examination of the process involved in developing and automating supervised end-to-end machine learning workflows for forecasting and classification purposes. It offers a complete overview of the components (i.e., feature engineering and model selection), principles (i.e., bias–variance decomposition, model complexity, overfitting, model sensitivity to feature assumptions and scaling, and output interpretability), models (i.e., neural networks and regression models), methods (i.e., cross-validation and data augmentation), metrics (i.e., Mean Squared Error and F1-score) and tools that rule most supervised learning applications with numerical and categorical data, as well as their integration, automation, and deployment. The end goal and contribution of this paper is the education and guidance of the non-AI expert academic community regarding complete and rigorous machine learning workflows and data science practices, from problem scoping to design and state-of-the-art automation tools, including basic principles and reasoning in the choice of methods. The paper delves into the critical stages of supervised machine learning workflow development, many of which are often omitted by researchers, and covers foundational concepts essential for understanding and optimizing a functional machine learning workflow, thereby offering a holistic view of task-specific application development for applied researchers who are non-AI experts. This paper may be of significant value to academic researchers developing and prototyping machine learning workflows for their own research or as customer-tailored solutions for government and industry partners.

Publisher

MDPI AG

Link

https://www.mdpi.com/2673-4117/5/1/21/pdf

Reference263 articles.

1. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges;Gibert;J. Netw. Comput. Appl.,2020

2. Scanflow: A multi-graph framework for Machine Learning workflow management, supervision, and debugging;Liu;Expert Syst. Appl.,2022

3. Intelligent failure prediction models for scientific workflows;Bala;Expert Syst. Appl.,2015

4. Two-stage optimization for machine learning workflow;Quemy;Inf. Syst.,2020

5. Evaluation of machine learning algorithms for forest stand species mapping using Sentinel-2 imagery and environmental data in the Polish Carpathians;Grabska;Remote Sens. Environ.,2020

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Development of a Language Model for Named-Entity-Recognition in Aerospace Requirements;Journal of Aerospace Information Systems;2024-06