ATICVis: A Visual Analytics System for Asymmetric Transformer Models Interpretation and Comparison-Reference-Cited by-同舟云学术

ATICVis: A Visual Analytics System for Asymmetric Transformer Models Interpretation and Comparison

Published:2023-01-26 Issue:3 Volume:13 Page:1595
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Wu Jian-Lin¹,Chang Pei-Chen¹,Wang Chao¹^ORCID,Wang Ko-Chih¹^ORCID

Affiliation:

1. Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 116, Taiwan

Abstract

In recent years, natural language processing (NLP) technology has made great progress. Models based on transformers have performed well in various natural language processing problems. However, a natural language task can be carried out by multiple different models with slightly different architectures, such as different numbers of layers and attention heads. In addition to quantitative indicators such as the basis for selecting models, many users also consider the language understanding ability of the model and the computing resources it requires. However, comparing and deeply analyzing two transformer-based models with different numbers of layers and attention heads are not easy because it lacks the inherent one-to-one match between models, so comparing models with different architectures is a crucial and challenging task when users train, select, or improve models for their NLP tasks. In this paper, we develop a visual analysis system to help machine learning experts deeply interpret and compare the pros and cons of asymmetric transformer-based models when the models are applied to a user’s target NLP task. We propose metrics to evaluate the similarity between layers or attention heads to help users to identify valuable layers and attention head combinations to compare. Our visual tool provides an interactive overview-to-detail framework for users to explore when and why models behave differently. In the use cases, users use our visual tool to find out and explain why a large model does not significantly outperform a small model and understand the linguistic features captured by layers and attention heads. The use cases and user feedback show that our tool can help people gain insight and facilitate model comparison tasks.

Funder

National Science and Technology Council

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/3/1595/pdf

Reference46 articles.

1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017, January 4–9). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA.

2. Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv.

3. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., and Le, Q.V. (2019, January 8–14). Xlnet: Generalized autoregressive pretraining for language understanding. Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, BC, Canada.

4. Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training, OpenAI.

5. Language models are unsupervised multitask learners;Radford;OpenAI Blog,2019

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. EXplainable Artificial Intelligence (XAI)—From Theory to Methods and Applications;IEEE Access;2024

2. Multi-Task Transformer Visualization to build Trust for Clinical Outcome Prediction;2023 Workshop on Visual Analytics in Healthcare (VAHC);2023-10-22