Better Classifier Calibration for Small Datasets-Reference-Cited by-同舟云学术

Better Classifier Calibration for Small Datasets

Published:2020-06-30 Issue:3 Volume:14 Page:1-19
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Tuomo Alasalmi¹^ORCID,Suutala Jaakko¹,Röning Juha¹,Koskimäki Heli²

Affiliation:

1. University of Oulu, Finland

2. Oura Health Ltd., Oulu, Finland

Abstract

Classifier calibration does not always go hand in hand with the classifier’s ability to separate the classes. There are applications where good classifier calibration, i.e., the ability to produce accurate probability estimates, is more important than class separation. When the amount of data for training is limited, the traditional approach to improve calibration starts to crumble. In this article, we show how generating more data for calibration is able to improve calibration algorithm performance in many cases where a classifier is not naturally producing well-calibrated outputs and the traditional approach fails. The proposed approach adds computational cost but considering that the main use case is with small datasets this extra computational cost stays insignificant and is comparable to other methods in prediction time. From the tested classifiers, the largest improvement was detected with the random forest and naive Bayes classifiers. Therefore, the proposed approach can be recommended at least for those classifiers when the amount of data available for training is limited and good calibration is essential.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3385656

Reference32 articles.

1. Getting More Out of Small Data Sets - Improving the Calibration Performance of Isotonic Regression by Generating More Data

2. Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms

3. Calibrating Random Forests

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The development of fragility curves using calibrated probabilistic classifiers;Structures;2024-06

2. On application of machine learning classifiers in evaluating liquefaction potential of civil infrastructure;Interpretable Machine Learning for the Analysis, Design, Assessment, and Informed Decision Making for Civil Infrastructure;2024

3. Combining Representation Learning and Active Learning for Applications in Process Manufacturing;Chemie Ingenieur Technik;2023-04-25

4. Machine Learning Experiments with Artificially Generated Big Data from Small Immunotherapy Datasets;2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA);2022-12

5. Exploring the Characteristics and Security Risks of Emerging Emoji Domain Names;Computer Security – ESORICS 2022;2022