Multi-step Transfer Learning in Natural Language Processing for the Health Domain

Author:

Manaka Thokozile,Zyl Terence Van,Kar Deepak,Wade Alisha

Abstract

AbstractThe restricted access to data in healthcare facilities due to patient privacy and confidentiality policies has led to the application of general natural language processing (NLP) techniques advancing relatively slowly in the health domain. Additionally, because clinical data is unique to various institutions and laboratories, there are not enough standards and conventions for data annotation. In places without robust death registration systems, the cause of death (COD) is determined through a verbal autopsy (VA) report. A non-clinician field agent completes a VA report using a set of standardized questions as guide to identify the symptoms of a COD. The narrative text of the VA report is used as a case study to examine the difficulties of applying NLP techniques to the healthcare domain. This paper presents a framework that leverages knowledge across multiple domains via two domain adaptation techniques: feature extraction and fine-tuning. These techniques aim to improve VA text representations for COD classification tasks in the health domain. The framework is motivated by multi-step learning, where a final learning task is realized via a sequence of intermediate learning tasks. The framework builds upon the strengths of the Bidirectional Encoder Representations from Transformers (BERT) and Embeddings from Language Models (ELMo) models pretrained on the general English and biomedical domains. These models are employed to extract features from the VA narratives. Our results demonstrate improved performance when initializing the learning of BERT embeddings with ELMo embeddings. The benefit of incorporating character-level information for learning word embeddings in the English domain, coupled with word-level information for learning word embeddings in the biomedical domain, is also evident.

Funder

University of the Witwatersrand

Publisher

Springer Science and Business Media LLC

Reference79 articles.

1. United Nations (2013) Department of economic and social affairs, population division, united nations. World Population Prospects: The 2012 revision

2. World Health Organisation (2007) Verbal autopsy standards: ascertaining and attributing cause of death, Geneva. Switzerland, World Health Organisation

3. Hirschman L, Chapman WW, D’Avolio LW, Savova GK, Uzuner O (2011) Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 18(5):450–453

4. Ohno-Machado L, Nadkarni P, Chapman W (2011) Natural language processing: an introduction. J Am Med Inform Assoc 18:544–51

5. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3