This chapter highlights and discusses the special characteristics of learner corpus data and the challenges they may present for corpus compilation, annotation, and analysis. Because learner corpus and SLA researchers use their data to study L2 production and development, it is of utmost importance that the data are valid, that is, they represent “authentic” L2 production, which means that the data must stem from the studied learners’ own language production. I discuss challenges in three areas: (1) multilingual practices and metalinguistic language use, (2) lexical and constructional bias, often brought about by the wording of task instructions or writing prompts that learners are asked to respond to, and (3) learner corpus annotation in view of the “discourse of deficit” in SLA. For each of these challenges solutions as to how they can be met are offered.
