Abstract
This chapter discusses problems in the interpretation of corpus data arising from the insufficiencies in the annotation of named entities. Many corpora nowadays still do not adequately enable corpus users to set up queries that would exclude items appearing in names when needed to improve precision of the searches. Through an examination of case studies in major English language corpora, the chapter highlights the need to carefully post-process the search results, as irrelevant occurrences of named entities may pose challenges in the analyses of word frequencies and their collocational behaviour. The chapter calls for more detailed annotation of named entities in already available large linguistic corpora and reminds of the importance of close inspection of the search hits.
Publisher
John Benjamins Publishing Company
Reference31 articles.
1. CCOHA: Clean Corpus of Historical American English;Alatrash,2020
2. On the Grammatical Status of Names
3. The Grammar of Names