Author:
Sinclair Jeanne,Davies Scott,Janus Magdalena
Abstract
IntroductionLongitudinal data that tracks student achievement over many years are crucial for understanding children's learning and for guiding effective policies and interventions. Despite being Canada's most populous province, Ontario lacks such large-scale and longitudinal data on student learning. Linking datasets across cohorts requires rigorous linkage protocols, flexible handling of complex cohort structures, methods to validate linked datasets, and viable organizational partnerships. We linked administrative data on early child development and educational achievement and merged two datasets on characteristics of students' neighborhoods and schools. We developed a linkage protocol and validated how the resulting database could be generalized to Ontario's student population.
Methods and analysisTwo main individual-level data sources were linked: 1) the Early Development Instrument (EDI), a school readiness assessment of all Ontario public school kindergartners that is administered in three-year cycles, and 2) Ontario's Educational Quality and Assessment Office's (EQAO) math and reading assessments in grades 3, 6, 9, and 10. To compensate for their lack of a common personal identification number, a deterministic linkage process was developed using several administrative variables. A school-level and a neighborhood-level dataset were also later linked. We examined differences between unlinked and linked cases across several variables.
Results and implicationsWe successfully linked 50% of the EDI's 374,239 cases, 86,778 of which contained all five datapoints, creating a database tracking achievement for multiple cohorts from kindergarten through grade 10, with covariates for their development, demographics, affect, neighborhoods, and schools. Analyses revealed only negligible differences between linked and unlinked cases across several demographic measures, while small differences were detected across a neighborhood socioeconomic index and some measures of child development. In conclusion, we recommend the filling of key voids in sustainable research capacity by creating representative data through linkage protocols and data verification.
Subject
Information Systems and Management,Health Informatics,Information Systems,Demography