BACKGROUND
In recent years, the rapid development of artificial intelligence technology, and the combination of Chinese medicine is becoming increasingly close, artificial intelligence's powerful data processing capabilities and pattern recognition technology, is widely used in the depth mining of Chinese medicine information.
OBJECTIVE
In order to deeply explore the theoretical knowledge of Chinese medicine contained in Chinese medical cases, this paper explores the named entity recognition technology under the corpus characteristics of Chinese medical cases, and solves the problems of model performance degradation and low classification accuracy caused by sample imbalance.
METHODS
Introducing data enhancement methods to increase the diversity of the original samples, introducing loss-weighting methods to reduce the weight of the majority class and increase the weight of the minority class; extracting the contextual semantic information of the words using the BERT two-layer bi-directional Transformer structure to feature represent the text, and then connecting the BiLSTM-WCRF model to realise the downstream task of named entity recognition.
RESULTS
The experiments show that the Macro-F1 value of the BERT-BiLSTM-CRF(EDA) model is 10.1% higher than that of the BiLSTM-CRF(EDA) model with the introduction of the data enhancement method; and with the introduction of the loss weighting method on top of EDA, the Macro-F1 value of the BERT-BiLSTM-WCRF(EDA) model is 6.8% higher than that of the BiLSTM- WCRF(EDA) model by 6.8%.
CONCLUSIONS
The introduction of both data augmentation and loss weighting methods can mitigate overfitting while improving the model as a whole as well as entity recognition for individual labels.
CLINICALTRIAL
None