Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment

Author:

Banerjee Imon1ORCID,Li Kevin2,Seneviratne Martin13,Ferrari Michelle4,Seto Tina5,Brooks James D4,Rubin Daniel L167,Hernandez-Boussard Tina178

Affiliation:

1. Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA

2. Stanford University School of Medicine, 291 Campus Drive, Stanford, California 94305-5479, USA

3. Department of Biomedical Informatics, Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA

4. Department of Urology - Divisions, Stanford University School of Medicine, 875 Blake Wilbur, Stanford, California 94305-5479, USA

5. IRT Research Technology, Stanford University School of Medicine, Stanford, California 94305-5479, USA

6. Department of Radiology, Stanford University School of Medicine, Stanford, California 94305-5479, USA

7. Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA

8. Department of Surgery, Stanford University School of Medicine, 300 Pasteur Drive Stanford, California 94305-2200, USA

Abstract

Abstract Background The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD). Methods We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision). Results The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin. Conclusions We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms. Trial registration This is a chart review study and approved by Institutional Review Board (IRB).

Funder

National Cancer Institute of the National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Reference38 articles.

1. Cancer statistics, 2017;Siegel;CA Cancer J Clin,2017

2. 10-year outcomes after monitoring, surgery, or radiotherapy for localized prostate cancer;Hamdy;New Engl J Med,2016

3. Re: Comparative effectiveness of prostate cancer treatments: evaluating statistical adjustments for confounding in observational data. J;Weiss;Natl Cancer Inst,2011

4. The PCORI perspective on patient-centered outcomes research;Frank;JAMA,2014

5. Availability of structured and unstructured clinical data for comparative effectiveness research and quality improvement: a multisite assessment;Capurro;EGEMS (Wash DC),2014

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3