Generating sequential electronic health records using dual adversarial autoencoder

Author:

Lee Dongha1ORCID,Yu Hwanjo1,Jiang Xiaoqian2,Rogith Deevakar2,Gudala Meghana2,Tejani Mubeen2,Zhang Qiuchen3,Xiong Li3

Affiliation:

1. Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea

2. School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA

3. Department of Computer Science, Emory University, Atlanta, Georgia, USA

Abstract

Abstract Objective Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients’ independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder. Materials and Methods We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation. Results Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients’ data. Conclusions DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Reference41 articles.

1. Anonymising and sharing individual patient data;El Emam;BMJ,2015

2. A systematic review of re-identification attacks on health data;El Emam;PLoS One,2011

3. Evaluating the risk of patient re-identification from adverse drug event reports;El Emam;BMC Med Inform Decis Mak,2013

4. Estimating the re-identification risk of clinical data sets;Dankar;BMC Med Inform Decis Mak,2012

5. Assessing and minimizing re-identification risk in research data derived from health care records;Simon;EGEMS (Wash DC),2019

Cited by 33 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Evaluation of synthetic electronic health records: A systematic review and experimental assessment;Neurocomputing;2024-10

2. GAN-Based Privacy-Preserving Intelligent Medical Consultation Decision-Making;Group Decision and Negotiation;2024-09-12

3. On the evaluation of synthetic longitudinal electronic health records;BMC Medical Research Methodology;2024-08-14

4. SoK: Privacy-Preserving Data Synthesis;2024 IEEE Symposium on Security and Privacy (SP);2024-05-19

5. Human Resources Optimization for Public Space Security;Advances in Information Security, Privacy, and Ethics;2024-05-16

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3