Affiliation:
1. Department of Applied Mathematics and Computer Science Technical University of Denmark Kongens Lyngby Denmark
2. Department of Business Administration Technology and Social Sciences Luleå University of Technology Luleå Sweden
Abstract
AbstractIn most data analytic applications in manufacturing, understanding the data‐driven models plays a crucial role in complementing the engineering knowledge about the production process. Identifying relevant input variables, rather than only predicting the response through some “black‐box” model, is of great interest in many applications. There is, therefore, a growing focus on describing the contributions of the input variables to the model in the form of “variable importance”, which is readily available in certain machine learning methods such as random forest (RF). Once a ranking based on the importance measure of the variables is established, the question of how many variables are truly relevant in predicting the output variable rises. In this study, we focus on the Boruta algorithm, which is a wrapper around the RF model. It is a variable selection tool that assesses the variable importance measure for the RF model. It has been previously shown in the literature that the correlation among the input variables, which is often a common occurrence in high dimensional data, distorts and overestimates the importance of variables. The Boruta algorithm is also affected by this resulting in a larger set of input variables deemed important. To overcome this issue, in this study, we propose an extension of the Boruta algorithm for the correlated data by exploiting the conditional importance measure. This extension greatly improves the Boruta algorithm in the case of high correlation among variables and provides a more precise ranking of the variables that significantly contribute to the response. We believe this approach can be used in many industrial applications by providing more transparency and understanding of the process.
Subject
Management Science and Operations Research,Safety, Risk, Reliability and Quality
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Correlation to causality;Quality Engineering;2024-07-02
2. The ENBIS‐22 quality and reliability engineering international special issue;Quality and Reliability Engineering International;2023-12-17
3. Stream-Based Active Learning for Regression with Dynamic Feature Selection;2023 Fifth International Conference on Transdisciplinary AI (TransAI);2023-09-25