Author:
Bali Manish,Anandaraj S.P.
Abstract
Data used by current Biomedical named entity recognition (BioNER) systems has mostly been manually labelled for supervision. However, it might be difficult to find large amounts of annotated data, especially in fields with a high level of specialization, such as biomedical, bioinformatics, and so on. When dictionaries and ontologies are available, which are domain-specific knowledge resources, automatically tagged distantly supervised biomedical training data can be developed. However, any such distantly supervised NER result is normally noisy. The prevalence of false positives and false negatives with this type of autonomously generated data is the main problem that directly affects efficiency. This research investigates distant supervision to detect false positive occurrences in BioNER task. A reinforcement learning technique is employed that is modelled as a graphical processing unit (GPU) accelerated Markov decision process (MDP) with a neural network policy. To deal with false negative cases, we employ a partial annotation conditional random field (CRF) technique. Results on two benchmark datasets show a cutting-edge methodology that can enhance the functionality of the neural NER system. It goes on to show how the proposed approach cuts down on human annotated data for BioNER tasks in Natural Language Processing (NLP).
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction,Software
Reference39 articles.
1. Ma X, Hovy E. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. arXiv:160301354 [cs, stat] [Internet]. 2016 May 28; Available from: https//arxiv.org/abs/1603.01354.
2. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural Architectures for Named Entity Recognition [Internet]. arXiv.org. 2016. Available from: https//arxiv.org/abs/1603.01360.
3. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep contextualized word representations [Internet]. arXiv.org. 2018. Available from: https//arxiv.org/abs/1802.05365.
4. Akbik A, Blythe D, Vollgraf R. Contextual String Embeddings for Sequence Labeling [Internet]. 2018; pp. 1638-49. Available from: https//aclanthology.org/C18-1139.pdf.
5. Design Challenges and Misconceptions in Named Entity Recognition [Internet];Ratinov;ACLWeb. Boulder, Colorado: Association for Computational Linguistics,2009