Named-entity recognition for early modern textual documents: a review of capabilities and challenges with strategies for the future-Reference-Cited by-同舟云学术

Named-entity recognition for early modern textual documents: a review of capabilities and challenges with strategies for the future

Published:2021-06-07 Issue:6 Volume:77 Page:1223-1247
ISSN:0022-0418
Container-title:Journal of Documentation
language:en
Short-container-title:JD

Author:

Humbel Marco^ORCID,Nyhan Julianne^ORCID,Vlachidis Andreas^ORCID,Sloan Kim,Ortolja-Baird Alexandra^ORCID

Abstract

PurposeBy mapping-out the capabilities, challenges and limitations of named-entity recognition (NER), this article aims to synthesise the state of the art of NER in the context of the early modern research field and to inform discussions about the kind of resources, methods and directions that may be pursued to enrich the application of the technique going forward.Design/methodology/approachThrough an extensive literature review, this article maps out the current capabilities, challenges and limitations of NER and establishes the state of the art of the technique in the context of the early modern, digitally augmented research field. It also presents a new case study of NER research undertaken by Enlightenment Architectures: Sir Hans Sloane's Catalogues of his Collections (2016–2021), a Leverhulme funded research project and collaboration between the British Museum and University College London, with contributing expertise from the British Library and the Natural History Museum.FindingsCurrently, it is not possible to benchmark the capabilities of NER as applied to documents of the early modern period. The authors also draw attention to the situated nature of authority files, and current conceptualisations of NER, leading them to the conclusion that more robust reporting and critical analysis of NER approaches and findings is required.Research limitations/implicationsThis article examines NER as applied to early modern textual sources, which are mostly studied by Humanists. As addressed in this article, detailed reporting of NER processes and outcomes is not necessarily valued by the disciplines of the Humanities, with the result that it can be difficult to locate relevant data and metrics in project outputs. The authors have tried to mitigate this by contacting projects discussed in this paper directly, to further verify the details they report here.Practical implicationsThe authors suggest that a forum is needed where tools are evaluated according to community standards. Within the wider NER community, the MUC and ConLL corpora are used for such experimental set-ups and are accompanied by a conference series, and may be seen as a useful model for this. The ultimate nature of such a forum must be discussed with the whole research community of the early modern domain.Social implicationsNER is an algorithmic intervention that transforms data according to certain rules-, patterns- or training data and ultimately affects how the authors interpret the results. The creation, use and promotion of algorithmic technologies like NER is not a neutral process, and neither is their output A more critical understanding of the role and impact of NER on early modern documents and research and focalization of some of the data- and human-centric aspects of NER routines that are currently overlooked are called for in this paper.Originality/valueThis article presents a state of the art snapshot of NER, its applications and potential, in the context of early modern research. It also seeks to inform discussions about the kinds of resources, methods and directions that may be pursued to enrich the application of NER going forward. It draws attention to the situated nature of authority files, and current conceptualisations of NER, and concludes that more robust reporting of NER approaches and findings are urgently required. The Appendix sets out a comprehensive summary of digital tools and resources surveyed in this article.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Reference74 articles.

1. Named entity recognition applied on a database of Medieval Latin charters. The case of chartae burgundiae,2016

2. Al-Rfou, R. (2015), “Named entity extraction: languages coverage—Polyglot 16.07.04 documentation”, available at: https://polyglot.readthedocs.io/en/latest/NamedEntityRecognition.html#languages-coverage (accessed 17 February 2020).

3. Adapting the Edinburgh geoparser for historical georeferencing;International Journal of Humanities and Arts Computing,2015

4. Allen, R.B., Japzon, A., Achananuparp, P. and Lee, K.J. (2007), “A framework for text processing and supporting access to collections of digitized historical newspapers”, in Symposium on Human Interface and the Management of Information, Springer, Berlin, Heidelberg, pp. 235-244.

5. ISNI and VIAF–transforming ways of trustfully consolidating identities,2014

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Entity Recognition on Border Security;Proceedings of the 19th International Conference on Availability, Reliability and Security;2024-07-30

2. Our Heritage, Our Stories: developing AI tools to link and support community-generated digital cultural heritage;Journal of Documentation;2024-06-04

3. Genealogical Data-Driven Visits of Historical Cemeteries;Informatics;2024-02-22

4. Method for Linking Named Entities to Wikidata Concepts for Russian Texts;Lecture Notes in Networks and Systems;2024

5. Exploratory Analysis of the Applicability of Formalised Knowledge to Personal Experience Narration;Data Science—Analytics and Applications;2024