MKDAT: Multi-Level Knowledge Distillation with Adaptive Temperature for Distantly Supervised Relation Extraction
-
Published:2024-06-30
Issue:7
Volume:15
Page:382
-
ISSN:2078-2489
-
Container-title:Information
-
language:en
-
Short-container-title:Information
Author:
Long Jun1, Yin Zhuoying12, Han Yan3, Huang Wenti4
Affiliation:
1. Big Data Institute, Central South University, Changsha 410075, China 2. Guizhou Rural Credit Union, Guiyang 550000, China 3. School of Computer and Information Engineering, Guizhou University of Commerce, Guiyang 550025, China 4. School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411100, China
Abstract
Distantly supervised relation extraction (DSRE), first used to address the limitations of manually annotated data via automatically annotating the data with triplet facts, is prone to issues such as mislabeled annotations due to the interference of noisy annotations. To address the interference of noisy annotations, we leveraged a novel knowledge distillation (KD) method which was different from the conventional models on DSRE. More specifically, we proposed a model-agnostic KD method, Multi-Level Knowledge Distillation with Adaptive Temperature (MKDAT), which mainly involves two modules: Adaptive Temperature Regulation (ATR) and Multi-Level Knowledge Distilling (MKD). ATR allocates adaptive entropy-based distillation temperatures to different training instances for providing a moderate softening supervision to the student, in which label hardening is possible for instances with great entropy. MKD combines the bag-level and instance-level knowledge of the teacher as supervisions of the student, and trains the teacher and student at the bag and instance levels, respectively, which aims at mitigating the effects of noisy annotation and improving the sentence-level prediction performance. In addition, we implemented three MKDAT models based on the CNN, PCNN, and ATT-BiLSTM neural networks, respectively, and the experimental results show that our distillation models outperform the baseline models on bag-level and instance-level evaluations.
Funder
Department of Education of Hunan province
Reference42 articles.
1. Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (August, January 2). Distant supervision for relation extraction without labeled data. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore. 2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. The Semantic Web, Springer. 3. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and Taylor, J. (2008, January 10–12). Freebase: A collaboratively created graph database for structuring human knowledge. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada. 4. Jiang, H., Cui, L., Xu, Z., Yang, D., Chen, J., Li, C., Liu, J., Liang, J., Wang, C., and Xiao, Y. (2019, January 10–16). Relation Extraction Using Supervision from Topic Knowledge of Relation Labels. Proceedings of the IJCAI, Macao, China. 5. Zhang, N., Deng, S., Sun, Z., Wang, G., Chen, X., Zhang, W., and Chen, H. (2019, January 2–7). Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA.
|
|