Apples-to-apples in cross-validation studies-Reference-Cited by-同舟云学术

Apples-to-apples in cross-validation studies

Published:2010-11-09 Issue:1 Volume:12 Page:49-57
ISSN:1931-0145
Container-title:ACM SIGKDD Explorations Newsletter
language:en
Short-container-title:SIGKDD Explor. Newsl.

Author:

Forman George¹,Scholz Martin¹

Affiliation:

1. Hewlett-Packard Labs, Palo Alto, CA

Abstract

Cross-validation is a mainstay for measuring performance and progress in machine learning. There are subtle differences in how exactly to compute accuracy, F-measure and Area Under the ROC Curve (AUC) in cross-validation studies. However, these details are not discussed in the literature, and incompatible methods are used by various papers and software packages. This leads to inconsistency across the research literature. Anomalies in performance calculations for particular folds and situations go undiscovered when they are buried in aggregated results over many folds and datasets, without ever a person looking at the intermediate performance measurements. This research note clarifies and illustrates the differences, and it provides guidance for how best to measure classification performance under cross-validation. In particular, there are several divergent methods used for computing F-measure, which is often recommended as a performance measure under class imbalance, e.g., for text classification domains and in one-vs.-all reductions of datasets having many classes. We show by experiment that all but one of these computation methods leads to biased measurements, especially under high class imbalance. This paper is of particular interest to those designing machine learning software libraries and researchers focused on high class imbalance.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1882471.1882479

Reference7 articles.

1. BNS feature scaling

2. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

3. The WEKA data mining software

4. D. D. Lewis Y. Yang T. Rose and F. Li. RCV1: A new benchmark collection for text categorization research. volume 5 pages 361--397 2004. http://www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf. D. D. Lewis Y. Yang T. Rose and F. Li. RCV1: A new benchmark collection for text categorization research. volume 5 pages 361--397 2004. http://www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf.

Cited by 235 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing external validity in practice;Research in Economics;2024-09

2. GEMTELLIGENCE: Accelerating gemstone classification with deep learning;Communications Engineering;2024-08-20

3. Evaluating deep learning methods applied to Landsat time series subsequences to detect and classify boreal forest disturbances events: The challenge of partial and progressive disturbances;Remote Sensing of Environment;2024-05

4. Convolutional neural network-based real-time mosquito genus identification using wingbeat frequency: A binary and multiclass classification approach;Ecological Informatics;2024-05

5. Utilizing Artificial Intelligence for Text Classification in Communication Sciences;Advances in Computational Intelligence and Robotics;2024-03-15