The Impact of Activation Patterns in the Explainability of Large Language Models – A Survey of recent advances-Reference-Cited by-同舟云学术

The Impact of Activation Patterns in the Explainability of Large Language Models – A Survey of recent advances

Published:2024-04-10 Issue: Volume: Page:141-149
ISSN:
Container-title:Anais da XIX Escola Regional de Banco de Dados (ERBD 2024)
language:
Short-container-title:

Author:

Figênio Mateus R.,Santanché André,Gomes-Jr Luiz

Abstract

The performance benchmarks of Natural Language Processing (NLP) tasks have been overwhelmed by Large Language Models (LLMs), with their capabilities outshining many previous approaches to language modeling. But, despite the success in these tasks and the more ample and pervasive use of these models in many daily and specialized fields of application, little is known of how or why they reach the outputs they do. This study reviews the development of Language Models (LMs), the advances in their explainability approaches, and focuses on assessing methods to interpret and explain the neural network portion of LMs (specially of Transformer models) as means of better understanding them.

Publisher

Sociedade Brasileira de Computação - SBC

Reference18 articles.

1. Bau, A., Belinkov, Y., Sajjad, H., Durrani, N., Dalvi, F., and Glass, J. (2018). Identifying and controlling important neurons in neural machine translation.

2. Bender, E. M. and Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198, Online. Association for Computational Linguistics.

3. Dalvi, F., Durrani, N., Sajjad, H., Belinkov, Y., Bau, A., and Glass, J. (2019). What is one grain of sand in the desert? analyzing individual neurons in deep nlp models. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6309–6317.

4. DeRose, J. F., Wang, J., and Berger, M. (2020). Attention flows: Analyzing and comparing attention mechanisms in language models.

5. Geva, M., Caciularu, A., Wang, K. R., and Goldberg, Y. (2022). Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space.