Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing

Author:

Oh Inez Y1ORCID,Schindler Suzanne E2,Ghoshal Nupur23,Lai Albert M1ORCID,Payne Philip R O1ORCID,Gupta Aditi14

Affiliation:

1. Institute for Informatics, Washington University School of Medicine , St. Louis, Missouri, USA

2. Department of Neurology, Washington University School of Medicine , St. Louis, Missouri, USA

3. Department of Psychiatry, Washington University School of Medicine , St. Louis, Missouri, USA

4. Division of Biostatistics, Washington University School of Medicine , St. Louis, Missouri, USA

Abstract

AbstractObjectivesThere is much interest in utilizing clinical data for developing prediction models for Alzheimer’s disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR.Materials and MethodsWe developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings.ResultsDocumentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen’s kappa = 0.72–1) and positively correlated with the NLP-based phenotype extraction pipeline’s performance (average F1-score = 0.65–0.99) for each phenotype.DiscussionWe developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success.ConclusionSuccess of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.

Funder

Centene Corporation

Washington University-Centene

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3