Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature-Reference-Cited by-同舟云学术

Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature

Published:2023-10 Issue:1 Volume:60 Page:681-685
ISSN:2373-9231
Container-title:Proceedings of the Association for Information Science and Technology
language:en
Short-container-title:Proceedings of the Association for Information Science and Technology

Author:

Parulian Nikolaus Nova¹,Dubnicek Ryan¹,Evans Daniel J.¹,Hu Yuerong¹,Layne‐Worthey Glen¹,Downie J. Stephen¹,Heaton Raina²,Lu Kun²,Orr Raymond I.³,Magni Isabella⁴,Walsh John A.⁵

Affiliation:

1. University of Illinois at Urbana‐Champaign USA

2. University of Oklahoma USA

3. Dartmouth College USA

4. University of Sheffield UK

5. Indiana University USA

Abstract

ABSTRACTNamed Entity Recognition (NER), the automated identification and tagging of entities in text, is a popular natural language processing task, and has the power to transform restricted data into open datasets of entities for further research. This project benchmarks four NER models–Stanford NER, BookNLP, spaCy‐trf and RoBERTa–to identify the most accurate approach and generate an open‐access, gold‐standard dataset of human annotated entities. To meet a real‐world use case, we benchmark these models on a sample dataset of sentences from Native American authored literature, identifying edge cases and areas of improvement for future NER work.

Publisher

Wiley

Subject

Library and Information Sciences,General Computer Science

Reference25 articles.

1. About. (n.d.).Happy Transformer. Retrieved April 17 2023 fromhttps://happytransformer.com/

2. Bamman D. Underwood T. &Smith N. A.(2014).A Bayesian Mixed Effects Model of Literary Character. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 370–379.https://doi.org/10.3115/v1/P14-1035

3. Chilet J. A. Chen C. &Lin Y.(2016).Analyzing social media marketing in the high‐end fashion industry using Named Entity Recognition. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Advances in Social Networks Analysis and Mining (ASONAM) 2016 IEEE/ACM International Conference On 621–622.https://doi.org/10.1109/ASONAM.2016.7752300

4. Evaluating named entity recognition tools for extracting social networks from novels

5. Devlin J. Chang M.‐W. Lee K. &Toutanova K.(2018 October 11).BERT: Pre‐training of Deep Bidirectional Transformers for Language Understanding. ArXiv.Org.https://arxiv.org/abs/1810.04805v2