Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness-Reference-Cited by-同舟云学术

Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness

Published:2021-01-06 Issue:1 Volume:21 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Zhao Yiqing,Weroha Saravut J.,Goode Ellen L.,Liu Hongfang,Wang Chen^ORCID

Abstract

Abstract Background Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in the clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information. Methods We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated in a Foundation-tested women cancer cohort (N = 196). Upon retrieval of patients’ genetic information using NLP system, we assessed the completeness of genetic data captured in unstructured clinical notes according to a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results We identified seven topics in the clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance. Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, the capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies. Conclusions In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issues such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate the real-world utility of genetic information to initiate a prescription of targeted therapy.

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Health Policy,Computer Science Applications

Link

http://link.springer.com/content/pdf/10.1186/s12911-020-01364-y.pdf

Reference52 articles.

1. Couch FJ, Nathanson KL, Offit K. Two decades after BRCA: setting paradigms in personalized cancer care and prevention. Science. 2014;343(6178):1466–70.

2. Pruthi S, Gostout BS, Lindor NM. Identification and management of women with BRCA mutations or hereditary predisposition for breast and ovarian cancer. In: Mayo Clinic proceedings (Elsevier); 2010. p. 1111–20.

3. Venkitaraman AR. Cancer suppression by the chromosome custodians, BRCA1 and BRCA2. Science. 2014;343(6178):1470–5.

4. Rios J, Puhalla S. PARP inhibitors in breast cancer: BRCA and beyond. Breast Cancer. 2011;25(11):1014–25.

5. Turner N, Tutt A, Ashworth A. Hallmarks of’BRCAness’ in sporadic cancers. Nat Rev Cancer. 2004;4(10):814–9.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review;JMIR Medical Informatics;2023-12-15

2. Artificial intelligence-driven biomedical genomics;Knowledge-Based Systems;2023-11

3. Leveraging a pharmacogenomics knowledgebase to formulate a drug response phenotype terminology for genomic medicine;Bioinformatics;2022-10-12

4. Development of an Electronic Health Record Registry to Facilitate Collection of Commission on Cancer Metrics for Patients Undergoing Surgery for Breast Cancer;JCO Clinical Cancer Informatics;2022-10

5. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review (Preprint);2022-09-05