Maximizing classifier utility when training data is costly-Reference-Cited by-同舟云学术

Maximizing classifier utility when training data is costly

Published:2006-12 Issue:2 Volume:8 Page:31-38
ISSN:1931-0145
Container-title:ACM SIGKDD Explorations Newsletter
language:en
Short-container-title:SIGKDD Explor. Newsl.

Author:

Weiss Gary M.¹,Tian Ye¹

Affiliation:

1. Fordham University, Bronx, NY

Abstract

Classification is a well-studied problem in machine learning and data mining. Classifier performance was originally gauged almost exclusively using predictive accuracy. However, as work in the field progressed, more sophisticated measures of classifier utility that better represented the value of the induced knowledge were introduced. Nonetheless, most work still ignored the cost of acquiring training examples, even though this affects the overall utility of a classifier. In this paper we consider the costs of acquiring the training examples in the data mining process; we analyze the impact of the cost of training data on learning, identify the optimal training set size for a given data set, and analyze the performance of several progressive sampling schemes, which, given the cost of the training data, will generate classifiers that come close to maximizing the overall utility.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1233321.1233325

Reference13 articles.

1. Explicitly representing expected cost

2. Learning and Classifying Under Hard Budgets

3. Economical active feature-value acquisition through Expected Utility estimation

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Online learning agents for cost-sensitive topical data acquisition from the web;Intelligent Data Analysis;2022-04-18

2. Design of Query-Driven System for Time-Utility Based Data Mining on Medical Data;Lecture Notes in Business Information Processing;2015

3. A survey of emerging approaches to spam filtering;ACM Computing Surveys;2012-02

4. Fast Data Acquisition in Cost-Sensitive Learning;Advances in Data Mining. Applications and Theoretical Aspects;2011

5. Knows what it knows: a framework for self-aware learning;Machine Learning;2010-11-25