Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness

Author:

Zhao Yiqing,Weroha Saravut J.,Goode Ellen L.,Liu Hongfang,Wang ChenORCID

Abstract

Abstract Background Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in the clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information. Methods We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated in a Foundation-tested women cancer cohort (N = 196). Upon retrieval of patients’ genetic information using NLP system, we assessed the completeness of genetic data captured in unstructured clinical notes according to a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results We identified seven topics in the clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance. Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, the capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies. Conclusions In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issues such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate the real-world utility of genetic information to initiate a prescription of targeted therapy.

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Health Policy,Computer Science Applications

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3