BCDRRLE: A Bidirectional Cross-Dynamic Round Robin Learning Encoder Model for Medical Sentence Similarity (Preprint)-Reference-Cited by-同舟云学术

BCDRRLE: A Bidirectional Cross-Dynamic Round Robin Learning Encoder Model for Medical Sentence Similarity (Preprint)

Published:2022-03-20 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Huang Bo^ORCID

Abstract

BACKGROUND

In recent years, the construction of medical informatization has achieved remarkable results. A large number of electronic medical records are stored in medical electronic systems. With the continuous progress and development of Natural Language Processing (NLP) technology, researchers are focusing their attention on developing Artificial intelligence (AI) systems using large electronic health records (EHRs). How to efficiently retrieve and recommend similar content from massive EHRs is an urgent problem to be solved.

OBJECTIVE

This paper mainly solves the following problems for medical text similarity—(1)our model can effectively alleviate the problem of insufficient data in supervised learning (2)existing methods cannot generate high-quality sentence vectors (3) how to mine enough knowledge from unsupervised data?

METHODS

We propose a bidirectional cross-dynamic polling learning encoder model. This model uses semi-supervised learning to generate high-quality sentence vectors when there is a small amount of labeled data.

RESULTS

Our proposed Bidirectional Cross-Dynamic Round Robin Learning Encoder(BCDRRLE)structural model outperforms state-of-the-art models on three medical text sentence similarity datasets. Furthermore, our model can also produce higher-quality sentence representations.

CONCLUSIONS

The experimental results demonstrate that our proposed BCDRRLE structure model can still produce very good results in the case of a small amount of data. By introducing contrastive learning and an enhanced version of the denoising autoencoder, our model can efficiently produce high-quality sentence representations. Our proposed dynamic polling learning algorithm helps to further improve the performance of the model. Our approach not only yields better performance but is also more general and can be extended to other related task domains that use text representations.

Publisher

JMIR Publications Inc.

Reference29 articles.

1. Measures of the Amount of Ecologic Association Between Species

2. Term representation with Generalized Latent Semantic Analysis

3. The Google Similarity Distance

5. Neural sentence embedding models for semantic similarity estimation in the biomedical domain