Affiliation:
1. Mechanical and Industrial Engineering Department, University of Illinois Chicago, Chicago, IL 60612, USA
2. Department of Educational Leadership and Policy, University of Texas at Austin, Austin, TX 78712, USA
Abstract
The education sector has been quick to recognize the power of predictive analytics to enhance student success rates. However, there are challenges to widespread adoption, including the lack of accessibility and the potential perpetuation of inequalities. These challenges present in different stages of modeling, including data preparation, model development, and evaluation. These steps can introduce additional bias to the system if not appropriately performed. Substantial incompleteness in responses is a common problem in nationally representative education data at a large scale. This can lead to missing data and can potentially impact the representativeness and accuracy of the results. While many education-related studies address the challenges of missing data, little is known about the impact of handling missing values on the fairness of predictive outcomes in practice. In this paper, we aim to assess the disparities in predictive modeling outcomes for college student success and investigate the impact of imputation techniques on model performance and fairness using various notions. We conduct a prospective evaluation to provide a less biased estimation of future performance and fairness than an evaluation of historical data. Our comprehensive analysis of a real large-scale education dataset reveals key insights on modeling disparities and the impact of imputation techniques on the fairness of the predictive outcome under different testing scenarios. Our results indicate that imputation introduces bias if the testing set follows the historical distribution. However, if the injustice in society is addressed and, consequently, the upcoming batch of observations is equalized, the model would be less biased.
Funder
Institute of Education Sciences
Reference60 articles.
1. Ekowo, M., and Palmer, I. (New America, 2016). The Promise and Peril of Predictive Analytics in Higher Education: A Landscape Analysis, New America.
2. Big data’s disparate impact;Barocas;Calif. Law Rev.,2016
3. A review of missing data handling methods in education research;Cheema;Rev. Educ. Res.,2014
4. Reporting the use of multiple imputation for missing data in higher education research;Manly;Res. High. Educ.,2015
5. Statistical data preparation: Management of missing values and outliers;Kwak;Korean J. Anesthesiol.,2017
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献