Medical text classification based on the discriminative pre-training model and prompt-tuning-Reference-Cited by-同舟云学术

Medical text classification based on the discriminative pre-training model and prompt-tuning

Published:2023-01 Issue: Volume:9 Page:
ISSN:2055-2076
Container-title:DIGITAL HEALTH
language:en
Short-container-title:DIGITAL HEALTH

Author:

Wang Yu¹^ORCID,Wang Yuan²,Peng Zhenwan¹,Zhang Feifan¹^ORCID,Zhou Luyao¹,Yang Fei¹

Affiliation:

1. School of Biomedical Engineering, Anhui Medical University, Hefei, China

2. Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China

Abstract

Medical text classification, as a fundamental medical natural language processing task, aims to identify the categories to which a short medical text belongs. Current research has focused on performing the medical text classification task using a pre-training language model through fine-tuning. However, this paradigm introduces additional parameters when training extra classifiers. Recent studies have shown that the “prompt-tuning” paradigm induces better performance in many natural language processing tasks because it bridges the gap between pre-training goals and downstream tasks. The main idea of prompt-tuning is to transform binary or multi-classification tasks into mask prediction tasks by fully exploiting the features learned by pre-training language models. This study explores, for the first time, how to classify medical texts using a discriminative pre-training language model called ERNIE-Health through prompt-tuning. Specifically, we attempt to perform prompt-tuning based on the multi-token selection task, which is a pre-training task of ERNIE-Health. The raw text is wrapped into a new sequence with a template in which the category label is replaced by a [UNK] token. The model is then trained to calculate the probability distribution of the candidate categories. Our method is tested on the KUAKE-Question Intention Classification and CHiP-Clinical Trial Criterion datasets and obtains the accuracy values of 0.866 and 0.861. In addition, the loss values of our model decrease faster throughout the training period compared to the fine-tuning. The experimental results provide valuable insights to the community and suggest that prompt-tuning can be a promising approach to improve the performance of pre-training models in domain-specific tasks.

Funder

Initiation Fund of Anhui Medical University

Natural Science Foundation of Anhui Province of China

Publisher

SAGE Publications

Subject

Health Information Management,Computer Science Applications,Health Informatics,Health Policy

Link

http://journals.sagepub.com/doi/pdf/10.1177/20552076231193213

Reference48 articles.

1. Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models

2. Artificial intelligence (AI) systems for interpreting complex medical datasets

3. Clinical Natural Language Processing in languages other than English: opportunities and challenges

4. Predicting death risk analysis in fully vaccinated people using novel extreme regression-voting classifier

5. Clinical text classification research trends: Systematic literature review and open issues

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prompt Engineering Paradigms for Medical Applications: Scoping Review;Journal of Medical Internet Research;2024-09-10

2. Prompt Engineering Paradigms for Medical Applications: Scoping Review (Preprint);2024-05-14

3. A hybrid natural language processing model for short text classification using BCBLEM;2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT);2024-05-03

4. Large Language Models in Randomized Controlled Trials Design;2024-04-26

5. Hierarchical Text Classification of Chinese Public Security Cases Based on ERNIE 3.0 Model;2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL);2024-04-19