Distant Supervision with Transductive Learning for Adverse Drug Reaction Identification from Electronic Medical Records-Reference-Cited by-同舟云学术

Distant Supervision with Transductive Learning for Adverse Drug Reaction Identification from Electronic Medical Records

Published:2017 Issue: Volume:2017 Page:1-21
ISSN:2040-2295
Container-title:Journal of Healthcare Engineering
language:en
Short-container-title:Journal of Healthcare Engineering

Author:

Taewijit Siriwon¹²^ORCID,Theeramunkong Thanaruk¹^ORCID,Ikeda Mitsuru²

Affiliation:

1. The School of Information, Communication and Computer Technologies, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand

2. The School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi 923-1292, Japan

Abstract

Information extraction and knowledge discovery regarding adverse drug reaction (ADR) from large-scale clinical texts are very useful and needy processes. Two major difficulties of this task are the lack of domain experts for labeling examples and intractable processing of unstructured clinical texts. Even though most previous works have been conducted on these issues by applying semisupervised learning for the former and a word-based approach for the latter, they face with complexity in an acquisition of initial labeled data and ignorance of structured sequence of natural language. In this study, we propose automatic data labeling by distant supervision where knowledge bases are exploited to assign anentity-levelrelation label for each drug-event pair in texts, and then, we use patterns for characterizing ADR relation. The multiple-instance learning with expectation-maximization method is employed to estimate model parameters. The method applies transductive learning to iteratively reassign a probability of unknown drug-event pair at the training time. By investigating experiments with 50,998 discharge summaries, we evaluate our method by varying large number of parameters, that is, pattern types, pattern-weighting models, and initial and iterative weightings of relations for unlabeled data. Based on evaluations, our proposed method outperforms the word-based feature for NB-EM (iEM), MILR, and TSVM with F1 score of 11.3%, 9.3%, and 6.5% improvement, respectively.

Funder

Thammasat University

Publisher

Hindawi Limited

Subject

Health Informatics,Biomedical Engineering,Surgery,Biotechnology

Link

http://downloads.hindawi.com/journals/jhe/2017/7575280.pdf

Reference44 articles.

1. The use of tacit and explicit knowledge in public health: a qualitative study

2. Personalized Mortality Prediction Driven by Electronic Medical Data and a Patient Similarity Metric

3. Risk stratification using data from electronic medical records better predicts suicide risks than clinician assessments