Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
Author:
Benedum Corey M.1ORCID, Sondhi Arjun1, Fidyk Erin1, Cohen Aaron B.12, Nemeth Sheila1, Adamson Blythe13ORCID, Estévez Melissa1ORCID, Bozkurt Selen1ORCID
Affiliation:
1. Flatiron Health, Inc., 233 Spring Street, New York, NY 10003, USA 2. Department of Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA 3. Comparative Health Outcomes, Policy and Economics (CHOICE) Institute, University of Washington, Seattle, WA 98195, USA
Abstract
Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale.
Funder
Flatiron Health, Inc.
Subject
Cancer Research,Oncology
Reference41 articles.
1. Assessing function of electronic health records for real-world data generation;Guinn;BMJ Evid.-Based Med.,2019 2. Congressional intent for the HITECH Act;Stark;Am. J. Manag. Care,2010 3. An Exploratory Analysis of Real-World End Points for Assessing Outcomes Among Immunotherapy-Treated Patients with Advanced Non–Small-Cell Lung Cancer;Stewart;JCO Clin. Cancer Inform.,2019 4. Zhang, J., Symons, J., Agapow, P., Teo, J.T., Paxton, C.A., Abdi, J., Mattie, H., Davie, C., Torres, A.Z., and Folarin, A. (2022). Best practices in the real-world data life cycle. PLoS Digit. Health, 1. 5. Birnbaum, B., Nussbaum, N., Seidl-Rathkopf, K., Agrawal, M., Estevez, M., Estola, E., Haimson, J., He, L., Larson, P., and Richardson, P. (2020). Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research. arXiv.
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|