BACKGROUND
Extracting relations between the entities from Chinese electronic medical records(EMRs) is the key to automatically constructing medical knowledge graphs. Due to the less available labeled corpus, most of the current researches are based on shallow networks, which cannot fully capture the complex semantic features in the text of Chinese EMRs.
OBJECTIVE
In this study, a hybrid deep learning method based on semi-supervised learning is proposed to extract the entity relations from small-scale complex Chinese EMRs.
METHODS
The semantic features of sentences are extracted by residual network (ResNet) and the long dependent information is captured by bidirectional GRU (Gated Recurrent Unit). Then the attention mechanism is used to assign weights to the extracted features respectively, and the output of the two attention mechanisms is integrated for relation prediction. We adjusted the training process with manually annotated small-scale relational corpus and bootstrapping semi-supervised learning algorithm, and continuously expanded the datasets during the training process.
RESULTS
The experimental results show that the best F1-score of the proposed method on the overall relation categories reaches 89.78%, which is 13.07% higher than the baseline CNN model. The F1-score on DAP, SAP, SNAP, TeRD, TeAP, TeCP, TeRS, TeAS, TrAD, TrRD and TrAP 11 relation categories reaches 80.95%, 93.91%, 92.96%, 88.43%, 86.54%, 85.58%, 87.96%, 94.74%, 93.01%, 87.58% and 95.48%, respectively.
CONCLUSIONS
The hybrid neural network method strengthens the feature transfer and reuse between different network layers and reduces the cost of manual tagging relations. The results demonstrate that our proposed method is effective for the relation extraction in Chinese EMRs.