Fd-CasBGRel: A Joint Entity–Relationship Extraction Model for Aquatic Disease Domains
-
Published:2024-07-15
Issue:14
Volume:14
Page:6147
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Ye Hongbao123, Lv Lijian12, Zhou Chengquan23, Sun Dawei23
Affiliation:
1. College of Mathematics and Computer Science, Zhejiang A&F University, 666 Wusu Street, Hangzhou 311300, China 2. Agricultural Equipment Research Institute, Zhejiang Academy of Agricultural Sciences, 298 Desheng Middle Road, Hangzhou 310021, China 3. Key Laboratory of Agricultural Equipment in Southeast Hilly and Mountainous Areas of the Ministry of Agriculture and Rural Affairs (Ministry-Province Joint Construction), 298 Desheng Middle Road, Hangzhou 310021, China
Abstract
Entity–relationship extraction plays a pivotal role in the construction of domain knowledge graphs. For the aquatic disease domain, however, this relationship extraction is a formidable task because of overlapping relationships, data specialization, limited feature fusion, and imbalanced data samples, which significantly weaken the extraction’s performance. To tackle these challenges, this study leverages published books and aquatic disease websites as data sources to compile a text corpus, establish datasets, and then propose the Fd-CasBGRel model specifically tailored to the aquatic disease domain. The model uses the Casrel cascading binary tagging framework to address relationship overlap; utilizes task fine-tuning for better performance on aquatic disease data; trains on specialized aquatic disease corpora to improve adaptability; and integrates the BRC feature fusion module—which incorporates self-attention mechanisms, BiLSTM, relative position encoding, and conditional layer normalization—to leverage entity position and context for enhanced fusion. Further, it replaces the traditional cross-entropy loss function with the GHM loss function to mitigate category imbalance issues. The experimental results indicate that the F1 score of the Fd-CasBGRel on the aquatic disease dataset reached 84.71%, significantly outperforming several benchmark models. This model effectively addresses the challenges of ternary extraction’s low performance caused by high data specialization, insufficient feature integration, and data imbalances. The model achieved the highest F1 score of 86.52% on the overlapping relationship category dataset, demonstrating its robust capability in extracting overlapping data. Furthermore, We also conducted comparative experiments on the publicly available dataset WebNLG, and the model in this paper obtained the best performance metrics compared to the rest of the comparative models, indicating that the model has good generalization ability.
Funder
Key R&D Program of Zhejiang Agricultural Technology Cooperation Program in Zhejiang Province of China
Reference39 articles.
1. (Economic Daily, 2023). Construction of marine ranching to enrich the ‘blue granary’, Economic Daily, p. 011. 2. Feng, J.W. (Farmers’ Daily, 2022). The Ministry of Agriculture and Rural Affairs held the ‘14th Five-Year‘ Fishery High Quality Development Promotion Meeting, Farmers’ Daily, p. 001. 3. Comparative study on edible rate and protein contribution of aquatic products;Zhu;Chi. Fish Qua Stand.,2021 4. Fensel, D., Şimşek, U., Angele, K., Huaman, E., Kärle, E., Panasiuk, O., Toma, I., Umbrich, J., Wahler, A., and Fensel, D. (2020). Introduction: What is a knowledge graph?. Knowledge Graphs: Methodology, Tools and Selected Use Cases, Springer. 5. Long short-term memory;Hochreiter;Neural Comput.,1997
|
|