A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks
-
Published:2023
Issue:1
Volume:21
Page:1489-1507
-
ISSN:1551-0018
-
Container-title:Mathematical Biosciences and Engineering
-
language:
-
Short-container-title:MBE
Author:
Zou Shuilong1, Liu Zhaoyang2, Wang Kaiqi2, Cao Jun2, Liu Shixiong1, Xiong Wangping2, Li Shaoyi1
Affiliation:
1. Nanchang Institute of science & Technology, Nanchang 330004, China 2. School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China
Abstract
<abstract>
<p>Effective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream relationship extraction model does not take into account the associations between entities and relationships when extracting, resulting in insufficient semantic information to form an effective structured representation. In this paper, we propose a heterogeneous graph neural network relationship extraction model adapted to traditional Chinese medicine (TCM) text. First, the given sentence and predefined relationships are embedded by bidirectional encoder representation from transformers (BERT fine-tuned) word embedding as model input. Second, a heterogeneous graph network is constructed to associate words, phrases, and relationship nodes to obtain the hidden layer representation. Then, in the decoding stage, two-stage subject-object entity identification method is adopted, and the identifier adopts a binary classifier to locate the start and end positions of the TCM entities, identifying all the subject-object entities in the sentence, and finally forming the TCM entity relationship group. Through the experiments on the TCM relationship extraction dataset, the results show that the precision value of the heterogeneous graph neural network embedded with BERT is 86.99% and the F1 value reaches 87.40%, which is improved by 8.83% and 10.21% compared with the relationship extraction models CNN, Bert-CNN, and Graph LSTM.</p>
</abstract>
Publisher
American Institute of Mathematical Sciences (AIMS)
Subject
Applied Mathematics,Computational Mathematics,General Agricultural and Biological Sciences,Modeling and Simulation,General Medicine
Reference32 articles.
1. C. Yang, D. Xiao, Y. Luo, B. Li, X. Zhao, H. Zhang, A hybrid method based on semi-supervised learning for relation extraction in Chinese EMRs, BMC Med. Inf. Decis. Mak., 22 (2022), 169-181. https://doi.org/10.1186/s12911-022-01908-4 2. Q. Hu, T. Yu, J. Li, Q. Yu, L. Zhu, Y. Gu, End-to-End syndrome differentiation of Yin deficiency and Yang deficiency in traditional Chinese medicine, Comput. Methods Programs Biomed., 174 (2019), 9-15. https://doi.org/10.1016/j.cmpb.2018.10.011 3. L. Gong, J. Jiang, S. Chen, M. Qi, A syndrome differentiation model of TCM based on multi-label deep forest using biomedical text mining, Front. Genet., 14 (2023). ttps://doi.org/10.3389/fgene.2023.1272016 4. T. Qi, S. Qiu, X. Shen, H. Chen, S. Yang, H. Wen, et al., KeMRE: Knowledge-enhanced medical relation extraction for Chinese medicine instructions, J. Biomed. Inf., 120 (2021), 103834. https://doi.org/10.1016/j.jbi.2021.103834 5. H. Wan, M. F. Moens, W. Luyten, X. Zhou, Q. Mei, L. Liu, et al., Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks, J. Am. Med. Inf. Assoc., 23 (2016), 356-365. https://doi.org/10.1093/jamia/ocv092
|
|