Affiliation:
1. University of Pittsburgh
Abstract
Abstract
This report introduces the University of Pittsburgh English Language Institute Corpus (PELIC;
Juffs et al., 2020), a publicly available 4.2-million-word learner corpus of
written texts. Collected over seven years in the University of Pittsburgh’s Intensive English Program, these texts were produced
by more than 1,100 students with diverse linguistic backgrounds and proficiency levels. Unlike most learner corpora which are
cross-sectional, PELIC is longitudinal, offering greater opportunities for tracking development in a natural classroom setting.
This potential is illustrated in an overview of the research conducted to date with these data. The report also provides a
description of PELIC’s creation and contents, including how the texts have been managed to facilitate natural language processing.
Overall, the corpus contributes to the field of learner corpus research by adding to the pool of freely and publicly available
learner corpora, supplemented by a useful set of Python tools and tutorials for accessing these data.
Publisher
John Benjamins Publishing Company
Subject
Linguistics and Language,Education,Language and Linguistics
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献