A longitudinal analysis of data quality in a large pediatric data research network

Author:

Khare Ritu12,Utidjian Levon12,Ruth Byron J1,Kahn Michael G3,Burrows Evanette12,Marsolo Keith4,Patibandla Nandan5,Razzaghi Hanieh2,Colvin Ryan6,Ranade Daksha7,Kitzmiller Melody8,Eckrich Daniel9,Bailey L Charles1210

Affiliation:

1. Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA

2. Department of Pediatrics, Children’s Hospital of Philadelphia

3. Department of Pediatrics, University of Colorado Denver Anschutz Medical Campus, Aurora, CO, USA

4. University of Cincinnati Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA

5. Information Services Department, Children’s Hospital Boston, Boston, MA, USA

6. Department of Pediatrics, Washington University in St. Louis, St. Louis, MO, USA

7. Research Informatics, Seattle Children’s Research Institute, Seattle, WA, USA

8. Research Information Solutions and Innovation, Nationwide Children’s Hospital, Columbus, OH, USA

9. Center for Pediatric Auditory and Speech Sciences, Nemours Biomedical Research, Wilmington, DE, USA

10. Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

Abstract

Abstract Objective PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children’s hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet. Materials and Methods Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners’ extract-transform-load analysts to determine the cause for each issue. Results The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%). Discussion The longitudinal findings demonstrate the network’s evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability. Conclusion While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3