A systematic review of the application of machine learning methods for patient recruitment through electronic health records (Preprint)

Author:

Shi WenORCID,Moffat KeithORCID,Kelsey Tom,Sullivan FrankORCID

Abstract

BACKGROUND

Electronic health records (EHRs) provide potential for more efficient patient recruitment into clinical studies. In recent years, machine learning techniques have gained increasing popularity in EHR-based research, with several studies reporting that machine learning methods can be used for disease diagnosis, outcome prediction, and treatment personalization.

OBJECTIVE

To explore which machine learning methods have been utilized for recruiting patients through EHRs and to compare their characteristics and outcomes.

METHODS

Search was conducted in MEDLINE, Embase, Scopus and OpenGrey for papers published before June 20th, 2019. Both relevant subject headings and relevant terms in title, abstract and keywords were incorporated in the search strategy. Two reviewers screened papers and decided on which ones to include and then extracted data and assessed bias for each paper included independently. Conflicts were resolved through a third reviewer. Included studies were compared in terms of year of publication, study location, study type, EHRs setting, study aim, clinical trial domain, size of the dataset, models or methods, data types, data processing and feature selection, evaluation, outcomes.

RESULTS

11 papers were included for synthesis. Ten were in-silico studies which simulated prediction of participant recruitment on computers. One study evaluated the machine learning assisted recruitment procedure in a real clinical setting. The in-silico studies covered diverse empirical frameworks in terms of the number of trials they used, trial domains, analysis unit, the size of the dataset, outcome definition, methods of data pre-processing, model building and evaluation, and performance measures. Different machine learning methods seem to have similar performance when evaluated under the same circumstances. NLP-incorporated similarity comparison appears more likely to generate better performance than similarity comparison applied to structured data alone in a similar experiment set-up. A single performance measure is not sufficient to fully evaluate a method. All the in-silico studies were judged at high risk of bias. The sole interventional study reported that a significant smaller percentage of time was spent on electronic screening using the system compared with not using it (P < 0.001), but a critical risk of bias was assigned to this study in the bias assessment phase.

CONCLUSIONS

Natural language processing techniques might be able to boost the accuracy of identifying eligible participants. However, complex methods impose more requirements on the availability of patient data and require customization of the methods to the target EHR system due to high risk of overfitting. It would be valuable if future in-silico studies in this domain could provide more details on data pre-processing, modelling and evaluation and could report different types of performance measures. More interventional studies are needed to generate higher quality evidence.

CLINICALTRIAL

Review registration: PROSPERO CRD42018103355

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3