Evaluating data quality for blended data using a data quality framework

Author:

Parker Jennifer D.1,Mirel Lisa B.2,Lee Philip3,Mintz Ryan4,Tungate Andrew5,Vaidyanathan Ambarish6

Affiliation:

1. National Center for Health Statistics, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services, Hyattsville, MD, USA

2. National Center for Science and Engineering Statistics,

3. Administration for Children and Families, U.S. Department of Health and Human Services, Washington, DC, USA

4. Office of the Assistant Director for Planning and Evaluation, U.S. Department of Health and Human Services, Washington, DC, USA

5. Centers for Medicare and Medicaid Services, U.S. Department of Health and Human Services, Baltimore, MD, USA

6. National Center for Environmental Health, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services, Atlanta, GA, USA

Abstract

In 2020 the U.S. Federal Committee on Statistical Methodology (FCSM) released “A Framework for Data Quality”, organized by 11 dimensions of data quality grouped among three domains of quality (utility, objectivity, integrity). This paper addresses the use of the FCSM Framework for data quality assessments of blended data. The FCSM Framework applies to all types of data, however best practices for implementation have not been documented. We applied the FCSM Framework for three health-research related case studies. For each case study, assessments of data quality dimensions were performed to identify threats to quality, possible mitigations of those threats, and trade-offs among them. From these assessments the authors concluded: 1) data quality assessments are more complex in practice than anticipated and expert guidance and documentation are important; 2) each dimension may not be equally important for different data uses; 3) data quality assessments can be subjective and having a quantitative tool could help explain the results, however, quantitative assessments may be closely tied to the intended use of the dataset; 4) there are common trade-offs and mitigations for some threats to quality among dimensions. This paper is one of the first to apply the FCSM Framework to specific use-cases and illustrates a process for similar data uses.

Publisher

IOS Press

Reference28 articles.

1. Federal Committee on Statistical Methodology. A framework for data quality. FCSM. 2020 September; 20-04. Available from https//www.fcsm.gov/assets/files/docs/FCSM.20.04_A_Framework_for_Data_Quality.pdf. [Accessed 6 December 2023].

2. National Center for Health Statistics. NCHS Data Linkage. NCHS Data Linked to US Department of Housing and Urban Development (HUD) Housing Assistance Data. NCHS Data Linkage – HUD Administrative Data [homepage on the internet] NCHS. 2023. Available from https//www.cdc.gov/nchs/data-linkage/hud.htm. [Accessed 6 December 2023].

3. U.S. Environmental Protection Agency (USEPA). Air Data: Air Quality Data Collected at Outdoor Monitors Across the US. [homepage on the internet]. USEPA. 2023. Available from https//www.epa.gov/outdoor-air-quality-data. [Accessed 6 December 2023].

4. U.S. Environmental Protection Agency (USEPA). CMAQ: The Community Multiscale Air Quality Modeling System. [homepage on the internet]. USEPA. 2023. Available from https//www.epa.gov/cmaq. [reviewed 2023 November 30; cited 2023 December 6].

5. U.S. Environmental Protection Agency (USEPA). Remote Sensing Information Gateway (RSIG) -Related Downloadable Data Files. [homepage on the internet]. USEPA. 2023. Available from https//www.epa.gov/hesc/rsig-related-downloadable-data-files. [Accessed 6 December 2023].

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3