Affiliation:
1. University of Illinois at Urbana‐Champaign USA
2. University of Oklahoma USA
3. Dartmouth College USA
4. University of Sheffield UK
5. Indiana University USA
Abstract
ABSTRACTNamed Entity Recognition (NER), the automated identification and tagging of entities in text, is a popular natural language processing task, and has the power to transform restricted data into open datasets of entities for further research. This project benchmarks four NER models–Stanford NER, BookNLP, spaCy‐trf and RoBERTa–to identify the most accurate approach and generate an open‐access, gold‐standard dataset of human annotated entities. To meet a real‐world use case, we benchmark these models on a sample dataset of sentences from Native American authored literature, identifying edge cases and areas of improvement for future NER work.
Subject
Library and Information Sciences,General Computer Science