Synonyms, Antonyms and Factual Knowledge in BERT Heads-Reference-Cited by-同舟云学术

Synonyms, Antonyms and Factual Knowledge in BERT Heads

Published:2023-06-29 Issue:7 Volume:15 Page:230
ISSN:1999-5903
Container-title:Future Internet
language:en
Short-container-title:Future Internet

Author:

Serina Lorenzo¹^ORCID,Putelli Luca¹,Gerevini Alfonso¹,Serina Ivan¹^ORCID

Affiliation:

1. Department of Information Engineering, Università Degli Studi di Brescia, Via Branze 38, 25100 Brescia, Italy

Abstract

In recent years, many studies have been devoted to discovering the inner workings of Transformer-based models, such as BERT, for instance, attempting to identify what information is contained within them. However, little is known about how these models store this information in their millions of parameters and which parts of the architecture are the most important. In this work, we propose an approach to identify self-attention mechanisms, called heads, that contain semantic and real-world factual knowledge in BERT. Our approach includes a metric computed from attention weights and exploits a standard clustering algorithm for extracting the most relevant connections between tokens in a head. In our experimental analysis, we focus on how heads can connect synonyms, antonyms and several types of factual knowledge regarding subjects such as geography and medicine.

Publisher

MDPI AG

Subject

Computer Networks and Communications

Link

https://www.mdpi.com/1999-5903/15/7/230/pdf

Reference27 articles.

1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;Devlin;Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019,2019

2. Peng, Y., Yan, S., and Lu, Z. (2019, January 1). Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. Proceedings of the 2019 Workshop on Biomedical Natural Language Processing (BioNLP 2019), Florence, Italy.

3. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv.

4. Tenney, I., Das, D., and Pavlick, E. (August, January 28). BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy.

5. Miaschi, A., Brunato, D., Dell’Orletta, F., and Venturi, G. (2020, January 8–13). Linguistic Profiling of a Neural Language Model. Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Real-World Implementation and Integration of an Automatic Scoring System for Workplace Safety Courses in Italian;Future Internet;2023-08-12