Author:
Zhou Jiayin,Hao Jie,Tang Mingkun,Sun Haixia,Wang Jiayang,Li Jiao,Qian Qing
Abstract
Abstract
Objective
This study aimed to develop and validate a quantitative index system for evaluating the data quality of Electronic Medical Records (EMR) in disease risk prediction using Machine Learning (ML).
Materials and methods
The index system was developed in four steps: (1) a preliminary index system was outlined based on literature review; (2) we utilized the Delphi method to structure the indicators at all levels; (3) the weights of these indicators were determined using the Analytic Hierarchy Process (AHP) method; and (4) the developed index system was empirically validated using real-world EMR data in a ML-based disease risk prediction task.
Results
The synthesis of review findings and the expert consultations led to the formulation of a three-level index system with four first-level, 11 second-level, and 33 third-level indicators. The weights of these indicators were obtained through the AHP method. Results from the empirical analysis illustrated a positive relationship between the scores assigned by the proposed index system and the predictive performances of the datasets.
Discussion
The proposed index system for evaluating EMR data quality is grounded in extensive literature analysis and expert consultation. Moreover, the system’s high reliability and suitability has been affirmed through empirical validation.
Conclusion
The novel index system offers a robust framework for assessing the quality and suitability of EMR data in ML-based disease risk predictions. It can serve as a guide in building EMR databases, improving EMR data quality control, and generating reliable real-world evidence.
Publisher
Springer Science and Business Media LLC