Sample Selection for Statistical Parsing-Reference-Cited by-同舟云学术

Sample Selection for Statistical Parsing

Published:2004-09 Issue:3 Volume:30 Page:253-276
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Hwa Rebecca¹

Affiliation:

1. University of Pittsburgh, Computer Science Department, Pittsburgh, PA 15260.

Abstract

Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind of resource is expensive and labor-intensive. This work proposes to use sample selection to find helpful training examples and reduce human effort spent on annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a helpful training example. Experiments are performed across two syntactic learning tasks and within the single task of parsing across two learning models to compare the effect of different predictive criteria. We find that sample selection can significantly reduce the size of annotated training corpora and that uncertainty is a robust predictive criterion that can be easily applied to different learning models.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/0891201041850894

Reference6 articles.

1. Improving generalization with active learning

2. Tree adjunct grammars

3. The estimation of stochastic context-free grammars using the Inside-Outside algorithm

Cited by 53 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Rethinking deep active learning for medical image segmentation: A diffusion and angle-based framework;Biomedical Signal Processing and Control;2024-10

2. Closer in time and higher correlation: disclosing the relationship between citation similarity and citation interval;Scientometrics;2024-06-20

3. Partial Image Active Annotation (PIAA): An Efficient Active Learning Technique Using Edge Information in Limited Data Scenarios;KI - Künstliche Intelligenz;2024-06-12

4. Pseudo-labeling and clustering-based active learning for imbalanced classification of wafer bin map defects;Signal, Image and Video Processing;2023-12-22

5. Discwise Active Learning for LiDAR Semantic Segmentation;IEEE Robotics and Automation Letters;2023-11