Research on entity recognition of Chinese medical case text for stroke disease by integrating data enhancement and loss weighting (Preprint)

Author:

Zhou JiaweiORCID,Su TongORCID,Liu Xiufeng

Abstract

BACKGROUND

In recent years, the rapid development of artificial intelligence technology, and the combination of Chinese medicine is becoming increasingly close, artificial intelligence's powerful data processing capabilities and pattern recognition technology, is widely used in the depth mining of Chinese medicine information.

OBJECTIVE

In order to deeply explore the theoretical knowledge of Chinese medicine contained in Chinese medical cases, this paper explores the named entity recognition technology under the corpus characteristics of Chinese medical cases, and solves the problems of model performance degradation and low classification accuracy caused by sample imbalance.

METHODS

Introducing data enhancement methods to increase the diversity of the original samples, introducing loss-weighting methods to reduce the weight of the majority class and increase the weight of the minority class; extracting the contextual semantic information of the words using the BERT two-layer bi-directional Transformer structure to feature represent the text, and then connecting the BiLSTM-WCRF model to realise the downstream task of named entity recognition.

RESULTS

The experiments show that the Macro-F1 value of the BERT-BiLSTM-CRF(EDA) model is 10.1% higher than that of the BiLSTM-CRF(EDA) model with the introduction of the data enhancement method; and with the introduction of the loss weighting method on top of EDA, the Macro-F1 value of the BERT-BiLSTM-WCRF(EDA) model is 6.8% higher than that of the BiLSTM- WCRF(EDA) model by 6.8%.

CONCLUSIONS

The introduction of both data augmentation and loss weighting methods can mitigate overfitting while improving the model as a whole as well as entity recognition for individual labels.

CLINICALTRIAL

None

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3