Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach-Reference-Cited by-同舟云学术

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

Published:2023-06-13 Issue: Volume:11 Page:e47862
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Li Runze^ORCID,Tian Yu^ORCID,Shen Zhuyi^ORCID,Li Jin^ORCID,Li Jun^ORCID,Ding Kefeng^ORCID,Li Jingsong^ORCID

Abstract

Background Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs. Objective A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods. Methods Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated. Results The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation. Conclusions Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference46 articles.

1. Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models

2. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review

3. The “Meaningful Use” Regulation for Electronic Health Records

4. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Revolutionizing personalized medicine with generative AI: a systematic review;Artificial Intelligence Review;2024-04-25

2. Revolutionizing Personalized Medicine with Generative AI: A Systematic Review;2024-01-24

3. Deep learning-based particle gradation detection of fillers;2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML);2023-11-03