Guidelines for normalising Early Modern English corpora: Decisions and justifications-Reference-Cited by-同舟云学术

Guidelines for normalising Early Modern English corpora: Decisions and justifications

Published:2015-03-01 Issue:1 Volume:39 Page:5-24
ISSN:1502-5462
Container-title:ICAME Journal
language:en
Short-container-title:

Author:

Archer Dawn¹,Kytö Merja²,Baron Alistair³,Rayson Paul³

Affiliation:

1. Universities of Central Lancashire

2. Uppsala

3. Lancaster

Abstract

Abstract Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.

Publisher

Walter de Gruyter GmbH

Cited by 37 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A search tool based on language modelling developed for The Index of Middle English Prose;Open Research Europe;2024-03-11

2. A search tool based on language modelling developed for The Index of Middle English Prose;Open Research Europe;2023-11-14

3. Take Help from Elder Brother: Old to Modern English NMT with Phrase Pair Feedback;Computational Linguistics and Intelligent Text Processing;2023

4. Textual variations affect human judgements of sentiment values;Electronic Commerce Research and Applications;2022-05

5. Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio;ICAME Journal;2021-05-01