A systematic review of the application of machine learning methods for patient recruitment through electronic health records (Preprint)-Reference-Cited by-同舟云学术

A systematic review of the application of machine learning methods for patient recruitment through electronic health records (Preprint)

Published:2021-05-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Shi Wen^ORCID,Moffat Keith^ORCID,Kelsey Tom,Sullivan Frank^ORCID

Abstract

BACKGROUND

Electronic health records (EHRs) provide potential for more efficient patient recruitment into clinical studies. In recent years, machine learning techniques have gained increasing popularity in EHR-based research, with several studies reporting that machine learning methods can be used for disease diagnosis, outcome prediction, and treatment personalization.

OBJECTIVE

To explore which machine learning methods have been utilized for recruiting patients through EHRs and to compare their characteristics and outcomes.

METHODS

Search was conducted in MEDLINE, Embase, Scopus and OpenGrey for papers published before June 20th, 2019. Both relevant subject headings and relevant terms in title, abstract and keywords were incorporated in the search strategy. Two reviewers screened papers and decided on which ones to include and then extracted data and assessed bias for each paper included independently. Conflicts were resolved through a third reviewer. Included studies were compared in terms of year of publication, study location, study type, EHRs setting, study aim, clinical trial domain, size of the dataset, models or methods, data types, data processing and feature selection, evaluation, outcomes.

RESULTS

11 papers were included for synthesis. Ten were in-silico studies which simulated prediction of participant recruitment on computers. One study evaluated the machine learning assisted recruitment procedure in a real clinical setting. The in-silico studies covered diverse empirical frameworks in terms of the number of trials they used, trial domains, analysis unit, the size of the dataset, outcome definition, methods of data pre-processing, model building and evaluation, and performance measures. Different machine learning methods seem to have similar performance when evaluated under the same circumstances. NLP-incorporated similarity comparison appears more likely to generate better performance than similarity comparison applied to structured data alone in a similar experiment set-up. A single performance measure is not sufficient to fully evaluate a method. All the in-silico studies were judged at high risk of bias. The sole interventional study reported that a significant smaller percentage of time was spent on electronic screening using the system compared with not using it (P < 0.001), but a critical risk of bias was assigned to this study in the bias assessment phase.

CONCLUSIONS

Natural language processing techniques might be able to boost the accuracy of identifying eligible participants. However, complex methods impose more requirements on the availability of patient data and require customization of the methods to the target EHR system due to high risk of overfitting. It would be valuable if future in-silico studies in this domain could provide more details on data pre-processing, modelling and evaluation and could report different types of performance measures. More interventional studies are needed to generate higher quality evidence.

CLINICALTRIAL

Review registration: PROSPERO CRD42018103355

Publisher

JMIR Publications Inc.

Reference37 articles.

1. A reinvestigation of recruitment to randomised, controlled, multicenter trials: a review of trials funded by two UK funding agencies

2. A reinvestigation of recruitment to randomised, controlled, multicenter trials: a review of trials funded by two UK funding agencies

3. Opportunities and challenges of delivering digital clinical trials: lessons learned from a randomised controlled trial of an online behavioural intervention for children and young people

4. Electronic Screening Improves Efficiency in Clinical Trial Recruitment

5. The Health Informatics Trial Enhancement Project (HITE): Using routinely collected primary care data to identify potential participants for a depression trial