Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics-Reference-Cited by-同舟云学术

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Published:2019-08-01 Issue:15 Volume:24 Page:2811
ISSN:1420-3049
Container-title:Molecules
language:en
Short-container-title:Molecules

Author:

Rácz ,Bajusz ,Héberger

Abstract

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

Publisher

MDPI AG

Subject

Chemistry (miscellaneous),Analytical Chemistry,Organic Chemistry,Physical and Theoretical Chemistry,Molecular Medicine,Drug Discovery,Pharmaceutical Science

Link

https://www.mdpi.com/1420-3049/24/15/2811/pdf

Reference34 articles.

1. Applications of machine learning in drug discovery and development

2. The rise of deep learning in drug discovery

3. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

4. Performance Measures for Binary Classification;Berrar;Encycl. Bioinform. Comput. Biol.,2019

5. Sum of ranking differences compares methods or models fairly

Cited by 73 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The application of chemical similarity measures in an unconventional modeling framework c-RASAR along with dimensionality reduction techniques to a representative hepatotoxicity dataset;Scientific Reports;2024-09-06

2. An efficient fake account identification in social media networks: Facebook and Instagram using NSGA-II algorithm;Neural Computing and Applications;2024-08-28

3. In Silico Exploration of Novel EGFR Kinase Mutant-Selective Inhibitors Using a Hybrid Computational Approach;Pharmaceuticals;2024-08-23

4. Harnessing Machine Learning to Uncover Hidden Patterns in Azole-Resistant CYP51/ERG11 Proteins;Microorganisms;2024-07-25

5. Graphical Insight: Revolutionizing Seizure Detection with EEG Representation;Biomedicines;2024-06-10