Uncertainty query sampling strategies for active learning of named entity recognition task-Reference-Cited by-同舟云学术

Uncertainty query sampling strategies for active learning of named entity recognition task

Published:2021-03-26 Issue:1 Volume:15 Page:99-114
ISSN:1872-4981
Container-title:Intelligent Decision Technologies
language:
Short-container-title:IDT

Author:

Agrawal Ankit¹,Tripathi Sarsij²,Vardhan Manu¹

Affiliation:

1. Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India

2. Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, Uttar Pradesh, India

Abstract

Active learning approach is well known method for labeling huge un-annotated dataset requiring minimal effort and is conducted in a cost efficient way. This approach selects and adds most informative instances to the training set iteratively such that the performance of learner improves with each iteration. Named entity recognition (NER) is a key task for information extraction in which entities present in sequences are labeled with correct class. The traditional query sampling strategies for the active learning only considers the final probability value of the model to select the most informative instances. In this paper, we have proposed a new active learning algorithm based on the hybrid query sampling strategy which also considers the sentence similarity along with the final probability value of the model and compared them with four other well known pool based uncertainty query sampling strategies based active learning approaches for named entity recognition (NER) i.e. least confident sampling, margin of confidence sampling, ratio of confidence sampling and entropy query sampling strategies. The experiments have been performed over three different biomedical NER datasets of different domains and a Spanish language NER dataset. We found that all the above approaches are able to reach to the performance of supervised learning based approach with much less annotated data requirement for training in comparison to that of supervised approach. The proposed active learning algorithm performs well and further reduces the annotation cost in comparison to the other sampling strategies based active algorithm in most of the cases.

Publisher

IOS Press

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction,Software

Reference48 articles.

1. Message understanding conference-6: A brief history;Grishman;Proceedings of the 16th Conference on Computational Linguistics [Internet]. Copenhagen, Denmark: Association for Computational Linguistics,1996

2. A survey of named entity recognition and classification;Nadeau;Lingvisticae Investig,2007

3. Biomedical named entity recognition using two-phase model based on SVMs;Lee;J Biomed Inform [Internet],2004

4. A literature review of social networking-based learning systems using a novel ISO-based framework;Krouska;Intell Decis Technol,2019

5. BioBERT: A pre-trained biomedical language representation model for biomedical text mining;Lee;Bioinformatics [Internet],2019

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. iSSL-AL: a deep active learning framework based on self-supervised learning for image classification;Neural Computing and Applications;2024-08-07

2. A study of deep active learning methods to reduce labelling efforts in biomedical relation extraction;PLOS ONE;2023-12-15

3. Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition;Computing;2021-08-28