A Model-Assisted Approach for Finding Coding Errors in Manual Coding of Open-Ended Questions-Reference-Cited by-同舟云学术

A Model-Assisted Approach for Finding Coding Errors in Manual Coding of Open-Ended Questions

Published:2021-08-03 Issue: Volume: Page:
ISSN:2325-0984
Container-title:Journal of Survey Statistics and Methodology
language:en
Short-container-title:

Author:

He Zhoushanyue¹,Schonlau Matthias²

Affiliation:

1. Statistical Scientist in Pharma Tech, F. Hoffmann-La Roche AG, Canada

2. Department of Statistics and Actuarial Science, University of Waterloo, Canada

Abstract

Abstract Text answers to open-ended questions are typically manually coded into one of several codes. Usually, a random subset of text answers is double-coded to assess intercoder reliability, but most of the data remain single-coded. Any disagreement between the two coders points to an error by one of the coders. When the budget allows double coding additional text answers, we propose employing statistical learning models to predict which single-coded answers have a high risk of a coding error. Specifically, we train a model on the double-coded random subset and predict the probability that the single-coded codes are correct. Then, text answers with the highest risk are double-coded to verify. In experiments with three data sets, we found that this method identifies two to three times as many coding errors in the additional text answers as compared to random guessing, on average. We conclude that this method is preferred if the budget permits additional double-coding. When there are a lot of intercoder disagreements, the benefit can be substantial.

Publisher

Oxford University Press (OUP)

Subject

Applied Mathematics,Statistics, Probability and Uncertainty,Social Sciences (miscellaneous),Statistics and Probability

Link

http://academic.oup.com/jssam/advance-article-pdf/doi/10.1093/jssam/smab022/39549798/smab022.pdf

Reference19 articles.

1. A Web-Based Program for Coding Open-Ended Response Protocols;Ames;Behavior Research Methods,2005

2. Coding Reliability and Validity of Interview Data;Crittenden;American Sociological Review,1971

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-label classification of open-ended questions with BERT;2023 Big Data Meets Survey Science (BigSurv);2023-10-26

2. A Score Function to Prioritize Editing in Household Survey Data: A Machine Learning Approach;Documentos de Trabajo;2023-10-19

3. A Review of Text Answer Coding Methods for Open-Ended Questions in Questionnaires;Statistics and Application;2023

4. The semi-automatic classification of an open-ended question on panel survey motivation and its application in attrition analysis;Frontiers in Big Data;2022-08-11