Temporal condition pattern mining in large, sparse electronic health record data: A case study in characterizing pediatric asthma

Author:

Campbell Elizabeth A12,Bass Ellen J13,Masino Aaron J14

Affiliation:

1. Department of Information Science, College of Computing & Informatics, Drexel University, Philadelphia, Pennsylvania, USA

2. Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA

3. Department of Health Systems and Sciences Research, College of Nursing & Health Professions, Philadelphia, Pennsylvania, USA

4. Department of Anesthesiology and Critical Care, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA

Abstract

Abstract Objective This study introduces a temporal condition pattern mining methodology to address the sparse nature of coded condition concept utilization in electronic health record data. As a validation study, we applied this method to reveal condition patterns surrounding an initial diagnosis of pediatric asthma. Materials and Methods The SPADE (Sequential PAttern Discovery using Equivalence classes) algorithm was used to identify common temporal condition patterns surrounding the initial diagnosis of pediatric asthma in a study population of 71 824 patients from the Children’s Hospital of Philadelphia. SPADE was applied to a dataset with diagnoses coded using International Classification of Diseases (ICD) concepts and separately to a dataset with the ICD codes mapped to their corresponding expanded diagnostic clusters (EDCs). Common temporal condition patterns surrounding the initial diagnosis of pediatric asthma ascertained by SPADE from both the ICD and EDC datasets were compared. Results SPADE identified 36 unique diagnoses in the mapped EDC dataset, whereas only 19 were recognized in the ICD dataset. Temporal trends in condition diagnoses ascertained from the EDC data were not discoverable in the ICD dataset. Discussion Mining frequent temporal condition patterns from large electronic health record datasets may reveal previously unknown associations between diagnoses that could inform future research into causation or other relationships. Mapping sparsely coded medical concepts into homogenous groups was essential to discovering potentially useful information from our dataset. Conclusions We expect that the presented methodology is applicable to the study of diagnostic trajectories for other clinical conditions and can be extended to study temporal patterns of other coded medical concepts such as medications and procedures.

Funder

Commonwealth Universal Research Enhancement

Pennsylvania Department of Health

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Reference48 articles.

1. Principles of data mining;Hand;Drug Saf,2007

2. A survey of sequential pattern mining;Fournier Viger;Data Sci Pattern Recognit,2017

3. Data mining for the internet of things: literature review and challenges;Chen;Int J Distrib Sens Netw,2015

4. A Comparative Study of Spam and PrefixSpan Sequential Pattern Mining Algorithm for Protein Sequences

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3