Affiliation:
1. University of California, Santa Barbara
2. Justus Liebig University Giessen
Abstract
Abstract
This paper critically discusses how corpus linguistics in general, but learner corpus research in particular, has been dealing with
all sorts of frequency data in general, but over- and underuse frequencies in particular. I demonstrate on the basis of learner
corpus data the pitfalls of using aggregate data and lacking statistical control that much work is unfortunately characterized by.
In fact, I will demonstrate that monofactorial methods have very little to offer at all to research on observational data. While
this paper is admittedly very didactic and methodological, I think the discussion of the empirical data offered here – a
reanalysis of previously published work – shows how misleading many studies potentially and provides far-reaching implications for
much of corpus linguistics and learner corpus research. Ideally/maximally, this paper together with Paquot & Plonsky (2017, Intntl. J. of Learner Corpus Research) would lead to a complete
revision of how learner corpus linguists use quantitative methods and study over-/underuse; minimally, this paper would stimulate
a much-needed discussion of currently lacking methodological sophistication.
Publisher
John Benjamins Publishing Company
Cited by
31 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献