ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study-Reference-Cited by-同舟云学术

ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study

Published:2021-01-22 Issue:1 Volume:9 Page:e23086
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Li Junyi^ORCID,Zhang Xuejie^ORCID,Zhou Xiaobing^ORCID

Abstract

Background In recent years, with increases in the amount of information available and the importance of information screening, increased attention has been paid to the calculation of textual semantic similarity. In the field of medicine, electronic medical records and medical research documents have become important data resources for clinical research. Medical textual semantic similarity calculation has become an urgent problem to be solved. Objective This research aims to solve 2 problems—(1) when the size of medical data sets is small, leading to insufficient learning with understanding of the models and (2) when information is lost in the process of long-distance propagation, causing the models to be unable to grasp key information. Methods This paper combines a text data augmentation method and a self-ensemble ALBERT model under semisupervised learning to perform clinical textual semantic similarity calculations. Results Compared with the methods in the 2019 National Natural Language Processing Clinical Challenges Open Health Natural Language Processing shared task Track on Clinical Semantic Textual Similarity, our method surpasses the best result by 2 percentage points and achieves a Pearson correlation coefficient of 0.92. Conclusions When the size of medical data set is small, data augmentation can increase the size of the data set and improved semisupervised learning can boost the learning efficiency of the model. Additionally, self-ensemble methods improve the model performance. Our method had excellent performance and has great potential to improve related medical problems.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference35 articles.

1. Energy Efficient Calculations of Text Similarity Measure on FPGA-Accelerated Computing Platforms

2. Short text similarity based on probabilistic topics

3. An Improved Text Similarity Calculation Algorithm Based on VSM

4. Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BERT-Based Neural Network for Inpatient Fall Detection From Electronic Medical Records: Retrospective Cohort Study;JMIR Medical Informatics;2024-01-30

2. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review;JMIR Medical Informatics;2023-12-15

3. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review (Preprint);2022-09-05

4. Identifying infected patients using semi-supervised and transfer learning;Journal of the American Medical Informatics Association;2022-07-23

5. Six sigma robust optimization method based on a pseudo single-loop strategy and RFR-DBN with insufficient samples;Computers & Structures;2021-12