Affiliation:
1. TU Dortmund University, Germany and University of Stuttgart, Institute of Software Engineering, Germany
2. Technical University of Munich, TUM School of Computation, Information and Technology, Germany
Abstract
Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.
Publisher
Association for Computing Machinery (ACM)
Reference102 articles.
1. Bug fix-time prediction model using naïve Bayes classifier
2. Software Effort, Quality, and Cycle Time: A Study of CMM Level 5 Projects
3. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records;Angrist Joshua D.;The American Economic Review,1990
4. Joshua David Angrist and Jörn-Steffen Pischke. 2009. Mostly harmless econometrics: An empiricist’s companion. Princeton University Press, Princeton and Oxford.
5. On making causal claims: A review and recommendations