BACKGROUND
In recent years, the construction of medical informatization has achieved remarkable results. A large number of electronic medical records are stored in medical electronic systems. With the continuous progress and development of Natural Language Processing (NLP) technology, researchers are focusing their attention on developing Artificial intelligence (AI) systems using large electronic health records (EHRs). How to efficiently retrieve and recommend similar content from massive EHRs is an urgent problem to be solved.
OBJECTIVE
This paper mainly solves the following problems for medical text similarity—(1)our model can effectively alleviate the problem of insufficient data in supervised learning (2)existing methods cannot generate high-quality sentence vectors (3) how to mine enough knowledge from unsupervised data?
METHODS
We propose a bidirectional cross-dynamic polling learning encoder model. This model uses semi-supervised learning to generate high-quality sentence vectors when there is a small amount of labeled data.
RESULTS
Our proposed Bidirectional Cross-Dynamic Round Robin Learning Encoder(BCDRRLE)structural model outperforms state-of-the-art models on three medical text sentence similarity datasets. Furthermore, our model can also produce higher-quality sentence representations.
CONCLUSIONS
The experimental results demonstrate that our proposed BCDRRLE structure model can still produce very good results in the case of a small amount of data. By introducing contrastive learning and an enhanced version of the denoising autoencoder, our model can efficiently produce high-quality sentence representations. Our proposed dynamic polling learning algorithm helps to further improve the performance of the model. Our approach not only yields better performance but is also more general and can be extended to other related task domains that use text representations.