Affiliation:
1. Plasma Chemistry Research Group, Institute of Materials and Environmental Chemistry, HUN-REN Research Centre for Natural Sciences, Institute of Excellence, Hungarian Academy of Sciences, Magyar tudósok krt. 2., H-1117 Budapest, Hungary
Abstract
Background: The development and application of machine learning (ML) methods have become so fast that almost nobody can follow their developments in every detail. It is no wonder that numerous errors and inconsistencies in their usage have also spread with a similar speed independently from the tasks: regression and classification. This work summarizes frequent errors committed by certain authors with the aim of helping scientists to avoid them. Methods: The principle of parsimony governs the train of thought. Fair method comparison can be completed with multicriteria decision-making techniques, preferably by the sum of ranking differences (SRD). Its coupling with analysis of variance (ANOVA) decomposes the effects of several factors. Earlier findings are summarized in a review-like manner: the abuse of the correlation coefficient and proper practices for model discrimination are also outlined. Results: Using an illustrative example, the correct practice and the methodology are summarized as guidelines for model discrimination, and for minimizing the prediction errors. The following factors are all prerequisites for successful modeling: proper data preprocessing, statistical tests, suitable performance parameters, appropriate degrees of freedom, fair comparison of models, and outlier detection, just to name a few. A checklist is provided in a tutorial manner on how to present ML modeling properly. The advocated practices are reviewed shortly in the discussion. Conclusions: Many of the errors can easily be filtered out with careful reviewing. Every authors’ responsibility is to adhere to the rules of modeling and validation. A representative sampling of recent literature outlines correct practices and emphasizes that no error-free publication exists.
Funder
Ministry of Innovation and Technology of Hungary from the National Research, Development and Innovation Fund
Reference79 articles.
1. Máttyus Nepomuk, J. (2008). Lótudomány Bands I and II, Pytheas Könyvmanufaktúra. reprint of 1845 edition. (In Hungarian).
2. Ardabili, S.-F., Mosavi, A., Ghamisi, P., Ferdinand, F., Varkonyi-Koczy, A.R., Reuter, U., Rabczuk, T., and Atkinson, P.M. (2020). COVID-19 Outbreak Prediction with Machine Learning. Algorithms, 13.
3. Kuhn, T.S. (2012). The Structure of Scientific Revolutions, University of Chicago Press. 50th Anniversary Edition.
4. (2023, September 29). Occam’s razor. Available online: https://en.wikipedia.org/wiki/Occam%27s_razor.
5. Statistical Modeling: The Two Cultures;Breiman;Stat. Sci.,2001
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献