Approach to Machine Learning for Extraction of Real-World Data Variables from Electronic Health Records-Reference-Cited by-同舟云学术

Approach to Machine Learning for Extraction of Real-World Data Variables from Electronic Health Records

Published:2023-03-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Adamson Blythe^ORCID,Waskom Michael^ORCID,Blarre Auriane^ORCID,Kelly Jonathan^ORCID,Krismer Konstantin^ORCID,Nemeth Sheila^ORCID,Gippetti James^ORCID,Ritten John^ORCID,Harrison Katherine^ORCID,Ho George^ORCID,Linzmayer Robin^ORCID,Bansal Tarun^ORCID,Wilkinson Samuel^ORCID,Amster Guy^ORCID,Estola Evan^ORCID,Benedum Corey M.^ORCID,Fidyk Erin^ORCID,Estevez Melissa^ORCID,Shapiro Will^ORCID,Cohen Aaron B.^ORCID

Abstract

ABSTRACTBackgroundAs artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability.MethodsWe applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (eg, clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (ie, not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information.ResultsWe developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates.ConclusionsNLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.

Publisher

Cold Spring Harbor Laboratory

Reference44 articles.

1. Bhardwaj R , Nambiar AR , Dutta D. A study of machine learning in healthcare. Abstract presented at: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC); July 4-8, 2017; Turin, Italy.

2. Artificial intelligence in digital pathology — new tools for diagnosis and precision oncology

3. Machine Learning in Oncology: Methods, Applications, and Challenges

4. A frame semantic overview of NLP-based information extraction for cancer-related EHR notes

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Using Artificial Intelligence to Label Free-Text Operative and Ultrasound Reports for Grading Pediatric Appendicitis;Journal of Pediatric Surgery;2024-05

2. Using Artificial Intelligence To Label Free-Text Operative And Ultrasound Reports For Grading Pediatric Appendicitis;2023-09-01

3. Real-world comparative effectiveness of acalabrutinib and ibrutinib in patients with chronic lymphocytic leukemia;Blood Advances;2023-08-09

4. Patients with HR+/HER2- metastatic breast cancer treated with CDK4/6 inhibitors: a real-world study in Italy;Breast Cancer Management;2023-06-01

5. Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning;Cancers;2023-03-20