Author:
Zhou Weipeng,Yetisgen Meliha,Afshar Majid,Gao Yanjun,Savova Guergana,Miller Timothy A.
Abstract
AbstractObjectiveThe classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP (“Subjective”, “Object”, “Assessment” and “Plan”) framework with improved transferability.Materials and methodsWe trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added.ResultsWe found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples.DiscussionAlthough considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods.ConclusionContinued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.
Publisher
Cold Spring Harbor Laboratory
Reference23 articles.
1. A comprehensive study of named entity recognition in Chinese clinical text
2. P. Zweigenbaum , L. Deléger , T. Lavergne , A. Névéol , and A. Bodnari , “A Supervised Abbreviation Resolution System for Medical Text,” presented at the Conference and Labs of the Evaluation Forum, 2013. Accessed: Feb. 20, 2023. [Online]. Available: https://www.semanticscholar.org/paper/A-Supervised-Abbreviation-Resolution-System-for-Zweigenbaum-Del%C3%A9ger/b3ba1306d0afb9f69412df1ca35ee1c49cf27a13
3. Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval;AMIA. Annu. Symp. Proc,2018
4. Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying
5. Current approaches to identify sections within clinical narratives from electronic health records: a systematic review