Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient-Reference-Cited by-同舟云学术

Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient

Published:2023-10-04 Issue:10 Volume:18 Page:e0291908
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Foody Giles M.^ORCID

Abstract

The accuracy of a classification is fundamental to its interpretation, use and ultimately decision making. Unfortunately, the apparent accuracy assessed can differ greatly from the true accuracy. Mis-estimation of classification accuracy metrics and associated mis-interpretations are often due to variations in prevalence and the use of an imperfect reference standard. The fundamental issues underlying the problems associated with variations in prevalence and reference standard quality are revisited here for binary classifications with particular attention focused on the use of the Matthews correlation coefficient (MCC). A key attribute claimed of the MCC is that a high value can only be attained when the classification performed well on both classes in a binary classification. However, it is shown here that the apparent magnitude of a set of popular accuracy metrics used in fields such as computer science medicine and environmental science (Recall, Precision, Specificity, Negative Predictive Value, J, F1, likelihood ratios and MCC) and one key attribute (prevalence) were all influenced greatly by variations in prevalence and use of an imperfect reference standard. Simulations using realistic values for data quality in applications such as remote sensing showed each metric varied over the range of possible prevalence and at differing levels of reference standard quality. The direction and magnitude of accuracy metric mis-estimation were a function of prevalence and the size and nature of the imperfections in the reference standard. It was evident that the apparent MCC could be substantially under- or over-estimated. Additionally, a high apparent MCC arose from an unquestionably poor classification. As with some other metrics of accuracy, the utility of the MCC may be overstated and apparent values need to be interpreted with caution. Apparent accuracy and prevalence values can be mis-leading and calls for the issues to be recognised and addressed should be heeded.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference71 articles.

1. In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis;H Song;Political Communication,2020

2. Measuring Diagnostic Test Performance Using Imperfect Reference Tests: A Partial Identification Approach;F. Obradović;arXiv preprint arXiv:2204.00180,2022

3. Good practices for estimating area and assessing accuracy of land change;P Olofsson;Remote Sensing of Environment,2014

4. Explaining the unsuitability of the kappa coefficient in the assessment and comparison of the accuracy of thematic maps obtained by image classification;GM Foody;Remote Sensing of Environment,2020

5. Automatic removal of imperfections and change detection for accurate 3D urban cartography by classification and incremental updating;AK Aijazi;Remote Sensing,2013

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluation of Multiple Classifier Systems for Mapping Different Hierarchical Levels of Forest Ecosystems in the Mediterranean Region Using Sentinel-2, Sentinel-1, and ICESat-2 Data;Forests;2023-11-11