Strategies and Lessons Learned During Cleaning of Data From Research Panel Participants: Cross-sectional Web-Based Health Behavior Survey Study-Reference-Cited by-同舟云学术

Strategies and Lessons Learned During Cleaning of Data From Research Panel Participants: Cross-sectional Web-Based Health Behavior Survey Study

Published:2022-06-23 Issue:6 Volume:6 Page:e35797
ISSN:2561-326X
Container-title:JMIR Formative Research
language:en
Short-container-title:JMIR Form Res

Author:

Arevalo Mariana^ORCID,Brownstein Naomi C^ORCID,Whiting Junmin^ORCID,Meade Cathy D^ORCID,Gwede Clement K^ORCID,Vadaparampil Susan T^ORCID,Tillery Kristin J^ORCID,Islam Jessica Y^ORCID,Giuliano Anna R^ORCID,Christy Shannon M^ORCID

Abstract

Background The use of web-based methods to collect population-based health behavior data has burgeoned over the past two decades. Researchers have used web-based platforms and research panels to study a myriad of topics. Data cleaning prior to statistical analysis of web-based survey data is an important step for data integrity. However, the data cleaning processes used by research teams are often not reported. Objective The objectives of this manuscript are to describe the use of a systematic approach to clean the data collected via a web-based platform from panelists and to share lessons learned with other research teams to promote high-quality data cleaning process improvements. Methods Data for this web-based survey study were collected from a research panel that is available for scientific and marketing research. Participants (N=4000) were panelists recruited either directly or through verified partners of the research panel, were aged 18 to 45 years, were living in the United States, had proficiency in the English language, and had access to the internet. Eligible participants completed a health behavior survey via Qualtrics. Informed by recommendations from the literature, our interdisciplinary research team developed and implemented a systematic and sequential plan to inform data cleaning processes. This included the following: (1) reviewing survey completion speed, (2) identifying consecutive responses, (3) identifying cases with contradictory responses, and (4) assessing the quality of open-ended responses. Implementation of these strategies is described in detail, and the Checklist for E-Survey Data Integrity is offered as a tool for other investigators. Results Data cleaning procedures resulted in the removal of 1278 out of 4000 (31.95%) response records, which failed one or more data quality checks. First, approximately one-sixth of records (n=648, 16.20%) were removed because respondents completed the survey unrealistically quickly (ie, <10 minutes). Next, 7.30% (n=292) of records were removed because they contained evidence of consecutive responses. A total of 4.68% (n=187) of records were subsequently removed due to instances of conflicting responses. Finally, a total of 3.78% (n=151) of records were removed due to poor-quality open-ended responses. Thus, after these data cleaning steps, the final sample contained 2722 responses, representing 68.05% of the original sample. Conclusions Examining data integrity and promoting transparency of data cleaning reporting is imperative for web-based survey research. Ensuring a high quality of data both prior to and following data collection is important. Our systematic approach helped eliminate records flagged as being of questionable quality. Data cleaning and management procedures should be reported more frequently, and systematic approaches should be adopted as standards of good practice in this type of research.

Publisher

JMIR Publications Inc.

Subject

Health Informatics,Medicine (miscellaneous)

Reference45 articles.

1. Internet surveysPew Research Center2021-12-15https://www.pewresearch.org/politics/methodology/collecting-survey-data/internet-surveys/

2. Epidemiological data can be gathered with world wide web

3. Mechanical Turk upends social sciences

4. Reliability of MTurk Data From Masters and Workers

5. Recruitment for Online Access Panels

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Corporate social responsibility and brand performance: Evidence from Ghana;Journal of International Management;2024-08

2. Quantifying data quality after removing respondents who fail data quality checks;Current Issues in Tourism;2024-07-17

3. A Consolidated Framework for Implementation Research-based process to develop theoretically-informed human papillomavirus vaccination educational materials for young adults;Patient Education and Counseling;2024-06

4. Collecting Paediatric Health-Related Quality of Life Data: Assessing the Feasibility and Acceptability of the Australian Paediatric Multi-Instrument Comparison (P-MIC) Study;Children;2023-09-26

5. Bridge to health informatics—a 5-week intensive online program to increase diversity in health informatics;Frontiers in Education;2023-09-06