A Natural Language Processing Model to Identify Confidential Content in Adolescent Clinical Notes

Author:

Rabbani Naveed1,Bedgood Michael2,Brown Conner3,Steinberg Ethan45,Goldstein Rachel L.6,Carlson Jennifer L.6,Pageler Natalie1,Morse Keith E.1

Affiliation:

1. Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States

2. California Department of Public Health, Richmond, California, United States

3. Information Services Department, Lucile Packard Children's Hospital, Palo Alto, California, United States

4. Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California, United States

5. Department of Computer Science, Stanford University, Stanford, California, United States

6. Division of Adolescent Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States

Abstract

Abstract Background The 21st Century Cures Act mandates the immediate, electronic release of health information to patients. However, in the case of adolescents, special consideration is required to ensure that confidentiality is maintained. The detection of confidential content in clinical notes may support operational efforts to preserve adolescent confidentiality while implementing information sharing. Objectives This study aimed to determine if a natural language processing (NLP) algorithm can identify confidential content in adolescent clinical progress notes. Methods A total of 1,200 outpatient adolescent progress notes written between 2016 and 2019 were manually annotated to identify confidential content. Labeled sentences from this corpus were featurized and used to train a two-part logistic regression model, which provides both sentence-level and note-level probability estimates that a given text contains confidential content. This model was prospectively validated on a set of 240 progress notes written in May 2022. It was subsequently deployed in a pilot intervention to augment an ongoing operational effort to identify confidential content in progress notes. Note-level probability estimates were used to triage notes for review and sentence-level probability estimates were used to highlight high-risk portions of those notes to aid the manual reviewer. Results The prevalence of notes containing confidential content was 21% (255/1,200) and 22% (53/240) in the train/test and validation cohorts, respectively. The ensemble logistic regression model achieved an area under the receiver operating characteristic of 90 and 88% in the test and validation cohorts, respectively. Its use in a pilot intervention identified outlier documentation practices and demonstrated efficiency gains over completely manual note review. Conclusion An NLP algorithm can identify confidential content in progress notes with high accuracy. Its human-in-the-loop deployment in clinical operations augmented an ongoing operational effort to identify confidential content in adolescent progress notes. These findings suggest NLP may be used to support efforts to preserve adolescent confidentiality in the wake of the information blocking mandate.

Publisher

Georg Thieme Verlag KG

Subject

Health Information Management,Computer Science Applications,Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3