Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning-Reference-Cited by-同舟云学术

Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning

Published:2020-11-27 Issue:11 Volume:8 Page:e22508
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Mahajan Diwakar^ORCID,Poddar Ananya^ORCID,Liang Jennifer J^ORCID,Lin Yen-Ting^ORCID,Prager John M^ORCID,Suryanarayanan Parthasarathy^ORCID,Raghavan Preethi^ORCID,Tsou Ching-Huei^ORCID

Abstract

Background Although electronic health records (EHRs) have been widely adopted in health care, effective use of EHR data is often limited because of redundant information in clinical notes introduced by the use of templates and copy-paste during note generation. Thus, it is imperative to develop solutions that can condense information while retaining its value. A step in this direction is measuring the semantic similarity between clinical text snippets. To address this problem, we participated in the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing Consortium (OHNLP) clinical semantic textual similarity (ClinicalSTS) shared task. Objective This study aims to improve the performance and robustness of semantic textual similarity in the clinical domain by leveraging manually labeled data from related tasks and contextualized embeddings from pretrained transformer-based language models. Methods The ClinicalSTS data set consists of 1642 pairs of deidentified clinical text snippets annotated in a continuous scale of 0-5, indicating degrees of semantic similarity. We developed an iterative intermediate training approach using multi-task learning (IIT-MTL), a multi-task training approach that employs iterative data set selection. We applied this process to bidirectional encoder representations from transformers on clinical text mining (ClinicalBERT), a pretrained domain-specific transformer-based language model, and fine-tuned the resulting model on the target ClinicalSTS task. We incrementally ensembled the output from applying IIT-MTL on ClinicalBERT with the output of other language models (bidirectional encoder representations from transformers for biomedical text mining [BioBERT], multi-task deep neural networks [MT-DNN], and robustly optimized BERT approach [RoBERTa]) and handcrafted features using regression-based learning algorithms. On the basis of these experiments, we adopted the top-performing configurations as our official submissions. Results Our system ranked first out of 87 submitted systems in the 2019 n2c2/OHNLP ClinicalSTS challenge, achieving state-of-the-art results with a Pearson correlation coefficient of 0.9010. This winning system was an ensembled model leveraging the output of IIT-MTL on ClinicalBERT with BioBERT, MT-DNN, and handcrafted medication features. Conclusions This study demonstrates that IIT-MTL is an effective way to leverage annotated data from related tasks to improve performance on a target task with a limited data set. This contribution opens new avenues of exploration for optimized data set selection to generate more robust and universal contextual representations of text in the clinical domain.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference77 articles.

1. EHR adopters vs. non-adopters: Impacts of, barriers to, and federal initiatives for EHR adoption

2. Association of Medical Directors of Information Systems Consensus on Inpatient Electronic Health Record Documentation

3. Cut-and-paste clinical notes confuse care, say US internists

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A new era in healthcare: The integration of artificial intelligence and microbial;Medicine in Novel Technology and Devices;2024-09

2. Large Language Models to process, analyze, and synthesize biomedical texts – a scoping review;2024-04-25

3. Language model and its interpretability in biomedicine: A scoping review;iScience;2024-04

4. Transformers in health: a systematic review on architectures for longitudinal data analysis;Artificial Intelligence Review;2024-02-03

5. BERT-Based Neural Network for Inpatient Fall Detection From Electronic Medical Records: Retrospective Cohort Study;JMIR Medical Informatics;2024-01-30