Affiliation:
1. Georgia State University, USA
2. Politehnica University of Bucharest, Romania
Abstract
Lexical frequency benchmarks have been extensively used to investigate second language (L2) lexical sophistication, especially in language assessment studies. However, indices based on semantic co-occurrence, which may be a better representation of the experience language users have with lexical items, have not been sufficiently tested as benchmarks of lexical sophistication. To address this gap, we developed and tested indices based on semantic co-occurrence from two computational methods, namely, Latent Semantic Analysis and Word2Vec. The indices were developed from one L2 written corpus (i.e., EF Cambridge Open Language Database [EF-CAMDAT]) and one first language (L1) written corpus (i.e., Corpus of Contemporary American English [COCA] Magazine). Available L1 semantic context indices (i.e., Touchstone Applied Sciences Associates [TASA] indices) were also assessed. To validate the indices, they were used to predict L2 essay quality scores as judged by human raters. The models suggested that the semantic context indices developed from EF-CAMDAT and TASA, but not the COCA Magazine indices, explained unique variance in the presence of lexical sophistication measures. This study suggests that semantic context indices based on multi-level corpora, including L2 corpora, may provide a useful representation of the experience L2 writers have with input, which may assist with automatic scoring of L2 writing.
Subject
Linguistics and Language,Social Sciences (miscellaneous),Language and Linguistics
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献