Medical nearest-word embedding technique implemented using an unsupervised machine learning approach for Bengali language-Reference-Cited by-同舟云学术

Medical nearest-word embedding technique implemented using an unsupervised machine learning approach for Bengali language

Published:2024-04-01 Issue:1 Volume:17 Page:
ISSN:1178-5608
Container-title:International Journal on Smart Sensing and Intelligent Systems
language:en
Short-container-title:

Author:

Mandal Kailash Pati¹,Mukherjee Prasenjit¹,Vishnu Devraj²,Chakraborty Baisakhi¹,Choudhury Tanupriya³⁴,Arya Pradeep Kumar⁵

Affiliation:

1. Computer Science and Engineering Department , National Institute of Technology , Durgapur , India

2. School of Computing Science and Engineering , VIT Bhopal , Bhopal , India

3. CSE Department , Graphic Era Deemed to be University , Dehradun , Uttarakhand , India

4. CSE Department , Symbiosis Institute of Technology, Symbiosis International (Deemed University) , Pune , Maharashtra , India

5. School of Computer Science Engineering and Technology (SCSET) , Bennett University , Greater Noida , , Uttar Pradesh , India

Abstract

Abstract The rapid growth of natural language processing (NLP) applications, such as text summarization, speech recognition, information extraction, and machine translation, has led to the development of structured query language (SQL) for extracting information from structured data. However, due to limited resources, converting Natural Language (NL) queries to SQL in Bengali is challenging. This article proposes an unsupervised machine learning model to find semantically Bengali closed words that can generate SQL from NL queries in Bengali. The main objective of the proposed system is to provide support in the creation of patient-oriented explanations and educational resources by simplifying intricate medical terminology. The major findings of the proposed system are as follows: The use of machine translation in the field of medicine facilitates the dissemination of healthcare information to a diverse international audience and improves the performance of entity recognition tasks, including the identification of medical conditions, drugs, or procedures within clinical notes or electronic health data. This system allows a naive user to extract health-related information from a healthcare-structured database without any knowledge of SQL. The system accepts a query and generates a response according to the query in Bengali language. Query tokenization and stop word removal are carried out in the preprocessing stage, and unsupervised machine learning techniques are implemented to process the input query sentence. Tokenized words are converted into vectors using the skip-gram model, with noise-contrastive estimation (NCE) applied to discriminate between actual and irrelevant words. Stochastic gradient descent (SGD) optimizes the model by randomly choosing a small amount of data from the dataset and using cosine similarity to measure closer words. The semantically closer words are found using an unsupervised learning method to generate the SQL.

Publisher

Walter de Gruyter GmbH

Link

https://www.sciendo.com/pdf/10.2478/ijssis-2024-0018

Reference38 articles.

1. O. Sen, et al., “Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning Based Methods,” IEEE Access, vol. 10, pp. 38999–39044 April 2022.

2. M. R. Hossain, and M. M. Hoque, “Towards Bengali Word Embedding: Corpus Creation, Intrinsic and Extrinsic Evaluations,” Preprints 2020, 2020120600 (doi: 10.20944/preprints202012.0600.v1).

3. M. R. Hossain, M. M. Hoque, N. Siddique, I. H. Sarkar, “Bengali text document categorization based on very deep convolution neural network,” Expert Systems with Applications. England, vol. 184, pp. 115394, December 2021.

4. E. A. Emon, S. Rahman, J. Banarjee, A. K. Das, T. Mittra, “A deep learning approach to detect abusive bengali text,” In 2019 7th International Conference on Smart Computing & Communications (ICSCC). Malaysia, pp. 1–5, June 2019.

5. T. T. Mayeesha, A. M, Sarwar, R. M. Rahman, “Deep learning based question answering system in Bengali,” Journal of Information and Telecommunication. England, vol. 5, no. 2, pp. 145–178, April 2021.