BACKGROUND
Insufficient participant enrollment is a major factor responsible for clinical trial failure.
OBJECTIVE
We formulated a machine learning (ML)-based framework using clinical laboratory parameters to identify participants eligible for enrollment in clinical trials.
METHODS
We acquired records of 11,592 patients with gastric cancer from the electronic medical records of Kyungpook National University Hospital in Korea from 2011 to 2019. Eight clinical laboratory parameters, including hemoglobin, neutrophil count, platelet count, total bilirubin, aspartate aminotransferase, alanine aminotransferase, alkaline phosphatase, and creatinine, and their acquisition dates were used for ML model development. The dataset was divided into training and test sets: a training dataset was collected from 2011 to 2018 to design an ML-based candidate selection method, and a test dataset was collected in 2019 to evaluate the performance of the proposed method. The generalization performance of the ML-based method was confirmed using the F1 score and the area under the curve (AUC). The proposed model was compared with a random selection method to evaluate its efficacy in recruiting participants.
RESULTS
The receiver operating characteristic curves of each clinical parameter ranged from 0.789 to 0.915, confirming the good performance of test results. Using ML, we extracted patients in the order of the highest predicted probability and identified valid candidates. The proposed ML model identified valid clinical trial subjects faster than random selection methods, demonstrating a maximum workload reduction of 57%.
CONCLUSIONS
The proposed ML-based framework using clinical laboratory parameters can be used to identify patients eligible for a clinical trial, enabling faster participant enrollment.
CLINICALTRIAL
KNUH 2020-04-023