Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts-Reference-Cited by-同舟云学术

Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts

Published:2023-02-15 Issue: Volume:2023 Page:1-22
ISSN:1687-5273
Container-title:Computational Intelligence and Neuroscience
language:en
Short-container-title:Computational Intelligence and Neuroscience

Author:

Parwez Md. Aslam¹,Fazil Mohd.²,Arif Muhammad³^ORCID,Nafis Md Tabrez¹^ORCID,Auwul Md. Rabiul⁴^ORCID

Affiliation:

1. Department of Computer Science & Engineering, Jamia Hamdard, New Delhi, India

2. University of Limerick, Limerick, Ireland

3. Department of Computer Science, Superior University Lahore, Lahore 54000, Pakistan

4. Department of Statistics, Bangabandhu Sheikh Mujibur Rahman Agricultural University, Gazipur 1706, Bangladesh

Abstract

Due to the increasing use of information technologies by biomedical experts, researchers, public health agencies, and healthcare professionals, a large number of scientific literatures, clinical notes, and other structured and unstructured text resources are rapidly increasing and being stored in various data sources like PubMed. These massive text resources can be leveraged to extract valuable knowledge and insights using machine learning techniques. Recent advancement in neural network-based classification models has gained popularity which takes numeric vectors (aka word representation) of training data as the input to train classification models. Better the input vectors, more accurate would be the classification. Word representations are learned as the distribution of words in an embedding space, wherein each word has its vector and the semantically similar words based on the contexts appear nearby each other. However, such distributional word representations are incapable of encapsulating relational semantics between distant words. In the biomedical domain, relation mining is a well-studied problem which aims to extract relational words, which associates distant entities generally representing the subject and object of a sentence. Our goal is to capture the relational semantics information between distant words from a large corpus to learn enhanced word representation and employ the learned word representation for various natural language processing tasks such as text classification. In this article, we have proposed an application of biomedical relation triplets to learn word representation through incorporating relational semantic information within the distributional representation of words. In other words, the proposed approach aims to capture both distributional and relational contexts of the words to learn their numeric vectors from text corpus. We have also proposed an application of the learned word representations for text classification. The proposed approach is evaluated over multiple benchmark datasets, and the efficacy of the learned word representations is tested in terms of word similarity and concept categorization tasks. Our proposed approach provides better performance in comparison to the state-of-the-art GloVe model. Furthermore, we have applied the learned word representations to classify biomedical texts using four neural network-based classification models, and the classification accuracy further confirms the effectiveness of the learned word representations by our proposed approach.

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

Link

http://downloads.hindawi.com/journals/cin/2023/2989791.pdf

Reference74 articles.

1. Automatic assignment of biomedical categories: toward a generic approach

2. MeSH Up: effective MeSH text classification for improved document retrieval

3. Distributed representations of words and phrases and their compositionality;T. Mikolov

4. Glove: Global Vectors for Word Representation

5. WordNet