Machine Learning Predictions as Regression Covariates-Reference-Cited by-同舟云学术

Machine Learning Predictions as Regression Covariates

Published:2020-11-11 Issue:4 Volume:29 Page:467-484
ISSN:1047-1987
Container-title:Political Analysis
language:en
Short-container-title:Polit. Anal.

Author:

Fong Christian,Tyler Matthew^ORCID

Abstract

AbstractIn text, images, merged surveys, voter files, and elsewhere, data sets are often missing important covariates, either because they are latent features of observations (such as sentiment in text) or because they are not collected (such as race in voter files). One promising approach for coping with this missing data is to find the true values of the missing covariates for a subset of the observations and then train a machine learning algorithm to predict the values of those covariates for the rest. However, plugging in these predictions without regard for prediction error renders regression analyses biased, inconsistent, and overconfident. We characterize the severity of the problem posed by prediction error, describe a procedure to avoid these inconsistencies under comparatively general assumptions, and demonstrate the performance of our estimators through simulations and a study of hostile political dialogue on the Internet. We provide software implementing our approach.

Publisher

Cambridge University Press (CUP)

Subject

Political Science and International Relations,Sociology and Political Science

Reference23 articles.

1. Dimitriadou, E. , Hornik, K. , Leisch, F. , Meyer, D. , Weingessel, A. , and Leisch, M. F. (2009). “Package ‘e1071’.” R Software package, http://cran.rproject.org/web/packages/e1071/index.html.

2. Race and Representation in Campaign Finance

3. Anti-Americanism and Anti-Interventionism in Arabic Twitter Discourses

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving Probabilistic Models In Text Classification Via Active Learning;American Political Science Review;2024-08-05

2. Measuring and Modeling Neighborhoods;American Political Science Review;2024-02-02

3. Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!;Communication Methods and Measures;2024-01-16

4. DeepKriging: Spatially Dependent Deep Neural Networks for Spatial Prediction;Statistica Sinica;2024

5. Machine Learning for Causal Inference: Is a Nonlinear First Stage Really Forbidden in 2SLS?;SSRN Electronic Journal;2024