Author:
Schmitz Andreas,Riebling Jan R.
Abstract
AbstractDigital process data are becoming increasingly important for social science research, but their quality has been gravely neglected so far. In this article, we adopt a process perspective and argue that data extracted from socio-technical systems are, in principle, subject to the same error-inducing mechanisms as traditional forms of social science data, namely biases that arise before their acquisition (observational design), during their acquisition (data generation), and after their acquisition (data processing). As the lack of access and insight into the actual processes of data production renders key traditional mechanisms of quality assurance largely impossible, it is essential to identify data quality problems in the data available—that is, to focus on the possibilities post-hoc quality assessment offers to us. We advance a post-hoc strategy of data quality assurance, integrating simulation and explorative identification techniques. As a use case, we illustrate this approach with the example of bot activity and the effects this phenomenon can have on digital process data. First, we employ agent-based modelling to simulate datasets containing these data problems. Subsequently, we demonstrate the possibilities and challenges of post-hoc control by mobilizing geometric data analysis, an exemplary technique for identifying data quality issues.
Funder
GESIS – Leibniz-Institut für Sozialwissenschaften e.V.
Publisher
Springer Science and Business Media LLC
Subject
Sociology and Political Science,Social Psychology
Reference56 articles.
1. Allen, Jennifer, Markus Mobius, David M. Rothschild and Duncan J. Watts. 2021. Research note: Examining potential bias in large-scale censored data. Harvard Kennedy School Misinformation Review.
2. Bachleitner, Reinhard, Martin Weichbold and Wolfgang Aschauer. 2010. Die Befragung im Kontext von Raum, Zeit und Befindlichkeit: Beiträge zu einer prozessorientierten Theorie der Umfrageforschung. Wiesbaden: Springer VS.
3. Barth, Alice, and Andreas Schmitz. 2018. Response quality and ideological dispositions: an integrative approach using geometric and classifying techniques. Quality & Quantity 52(1):175–194.
4. Baur, Nina, Peter Graeff, Lilli Braunisch and Malte Schweia. 2020. The Quality of Big Data. Development, Problems, and Possibilities of Use of Process-Generated Data in the Digital Age. Historical Social Research/Historische Sozialforschung 45:209–243.
5. Biemer, Paul P. 2010. Latent class analysis of survey error. Hoboken, NJ: John Wiley & Sons.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献