Affiliation:
1. Heinrich Heine University Düsseldorf
Abstract
In recent years, many researchers have called attention to the fact that research results very often cannot be replicated – a phenomenon that has been called replication crisis. The replication crisis in linguistics is highly relevant to corpus-based research: Many corpus studies are not directly replicable as the data on which they are based are not readily available. Especially in English linguistics, the full versions of many widely used corpora are still behind paywalls, which means that they are not accessible to parts of the global research community, and even when parts of the data are freely accessible, this presents problems for state-of-the-art methods of data analysis. In this paper, I discuss the challenges that have led to this situation and address some possible solutions. In particular, I argue for using smaller but openly available corpora whenever possible and for adopting open research practices as far as possible even when using commercial corpora.
Publisher
John Benjamins Publishing Company