Committee-Based Sample Selection for Probabilistic Classifiers-Reference-Cited by-同舟云学术

Committee-Based Sample Selection for Probabilistic Classifiers

Published:1999-11-15 Issue: Volume:11 Page:335-360
ISSN:1076-9757
Container-title:Journal of Artificial Intelligence Research
language:
Short-container-title:jair

Author:

Argamon-Engelson S.,Dagan I.

Abstract

In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We find that all variants of the method achieve a significant reduction in annotation cost, although their computational efficiency differs. In particular, the simplest variant, a two member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger.

Publisher

AI Access Foundation

Subject

Artificial Intelligence

Cited by 56 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Building a Fully-Automatized Active Learning Framework for the Semantic Segmentation of Geospatial 3D Point Clouds;PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science;2024-04

2. Combining Self-labeling with Selective Sampling;2023 IEEE International Conference on Data Mining Workshops (ICDMW);2023-12-04

3. New Problems in Active Sampling for Mobile Robotic Online Learning;2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC);2023-06

4. Review Classification Based on Machine Learning: Classifying Game User Reviews;IEEE Access;2023

5. Active label distribution learning;Neurocomputing;2021-05