Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora

Author:

König AlexanderORCID,Frey Jennifer-CarmenORCID,Stemle Egon W.ORCID

Abstract

Up until today research in various educational and linguistic domains such as learner corpus research, writing research, or second language acquisition has produced a substantial amount of research data in the form of L1 and L2 learner corpora. However, the multitude of individual solutions combined with domain-inherent obstacles in data sharing have so far hampered comparability, reusability and reproducibility of data and research results. In this article, we present work in creating a digital infrastructure for L1 and L2 learner corpora and populating it with data collected in the past. We embed our infrastructure efforts in the broader field of infrastructures for scientific research, drawing from technical solutions and frameworks from research data management, among which the FAIR guiding principles for data stewardship. We share our experiences from integrating some L1 and L2 learner corpora from concluded projects into the infrastructure while trying to ensure compliance with the FAIR principles and the standards we established for reproducibility, discussing how far research data that has been collected in the past can be made comparable, reusable and reproducible. Our results show that some basic needs for providing comparable and reusable data are covered by existing general infrastructure solutions and can be exploited for domain-specific infrastructures such as the one presented in this article. Other aspects need genuinely domain-driven approaches. The solutions found for the corpora in the presented infrastructure can only be a preliminary attempt, and further community involvement would be needed to provide templates and models acknowledged and promoted by the community. Furthermore, forward-looking data management would be needed starting from the beginning of new corpus creation projects to ensure that all requirements for FAIR data can be met.

Publisher

MDPI AG

Subject

Information Systems

Reference62 articles.

1. Learner Corpora in Foreign Language Education

2. Computer Learner Corpora, Second Language Acquisition, and Foreign Language Teaching;Granger,2002

3. EXMARaLDA and the FOLK tools—Two toolsets for transcribing and annotating spoken language;Schmidt,2012

4. Corpora and Language Learning with the Sketch Engine and SKELL

5. ANNIS3: A new architecture for generic corpus query and visualization

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. The Core Metadata Schema for Learner Corpora (LC-meta);International Journal of Learner Corpus Research;2024-09-13

2. Learner corpus research: a critical appraisal and roadmap for contributing (more) to SLA research agendas;Corpus Linguistics and Linguistic Theory;2024-05-14

3. QUEST: Guidelines and Specifications for the Assessment of Audiovisual, Annotated Language Data;Working Papers in Corpus Linguistics and Digital Technologies: Analyses and Methodology;2022-12-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3