Feature Engineered Relation Extraction – Medical Documents Setting-Reference-Cited by-同舟云学术

Feature Engineered Relation Extraction – Medical Documents Setting

Published:2016-08-15 Issue:3 Volume:12 Page:336-358
ISSN:1744-0084
Container-title:International Journal of Web Information Systems
language:en
Short-container-title:IJWIS

Author:

Barbantan Ioana,Porumb Mihaela,Lemnaru Camelia,Potolea Rodica

Abstract

Purpose Improving healthcare services by developing assistive technologies includes both the health aid devices and the analysis of the data collected by them. The acquired data modeled as a knowledge base give more insight into each patient’s health status and needs. Therefore, the ultimate goal of a health-care system is obtaining recommendations provided by an assistive decision support system using such knowledge base, benefiting the patients, the physicians and the healthcare industry. This paper aims to define the knowledge flow for a medical assistive decision support system by structuring raw medical data and leveraging the knowledge contained in the data proposing solutions for efficient data search, medical investigation or diagnosis and medication prediction and relationship identification. Design/methodology/approach The solution this paper proposes for implementing a medical assistive decision support system can analyze any type of unstructured medical documents which are processed by applying Natural Language Processing (NLP) tasks followed by semantic analysis, leading to the medical concept identification, thus imposing a structure on the input documents. The structured information is filtered and classified such that custom decisions regarding patients’ health status can be made. The current research focuses on identifying the relationships between medical concepts as defined by the REMed (Relation Extraction from Medical documents) solution that aims at finding the patterns that lead to the classification of concept pairs into concept-to-concept relations. Findings This paper proposed the REMed solution expressed as a multi-class classification problem tackled using the support vector machine classifier. Experimentally, this paper determined the most appropriate setup for the multi-class classification problem which is a combination of lexical, context, syntactic and grammatical features, as each feature category is good at representing particular relations, but not all. The best results we obtained are expressed as F1-measure of 74.9 per cent which is 1.4 per cent better than the results reported by similar systems. Research limitations/implications The difficulty to discriminate between TrIP and TrAP relations revolves around the hierarchical relationship between the two classes as TrIP is a particular type (an instance) of TrAP. The intuition behind this behavior was that the classifier cannot discern the correct relations because of the bias toward the majority classes. The analysis was conducted by using only sentences from electronic health record that contain at least two medical concepts. This limitation was introduced by the availability of the annotated data with reported results, as relations were defined at sentence level. Originality/value The originality of the proposed solution lies in the methodology to extract valuable information from the medical records via semantic searches; concept-to-concept relation identification; and recommendations for diagnosis, treatment and further investigations. The REMed solution introduces a learning-based approach for the automatic discovery of relations between medical concepts. We propose an original list of features: lexical – 3, context – 6, grammatical – 4 and syntactic – 4. The similarity feature introduced in this paper has a significant influence on the classification, and, to the best of the authors’ knowledge, it has not been used as feature in similar solutions.

Publisher

Emerald

Subject

Computer Networks and Communications,Information Systems

Reference45 articles.

1. Enabling online studies of conceptual relationships between medical terms: developing an efficient web platform;JMIR Medical Informatics,2014

2. I2B2 2010 challenge: machine learning for information extraction from patient records,2010

3. An overview of MetaMap: historical perspective and recent advances;Journal of the American Medical Informatics Association,2010

4. Exploiting Word Meaning for Negation Identification in Electronic Health Records,2014

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models;PLOS ONE;2024-07-17

2. Feature Extraction of Dialogue Text Based on Big Data and Machine Learning;International Journal of Web-Based Learning and Teaching Technologies;2024-02-07

3. Multimedia Technology of Spatial Data Mining Based on Genetic Algorithm;Computational Intelligence and Neuroscience;2022-05-21

4. Instance-Based Learning Following Physician Reasoning for Assistance during Medical Consultation;Applied Sciences;2021-06-24

5. A Predictive Machine Learning Model to Determine Alcohol Use Disorder;2020 IEEE Symposium on Computers and Communications (ISCC);2020-07