Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey-Reference-Cited by-同舟云学术

Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey

Published:2020-12-17 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Ghojogh Benyamin^ORCID,Ghodsi Ali

Abstract

This is a tutorial and survey paper on the attention mechanism, transformers, BERT, and GPT. We first explain attention mechanism, sequence-to-sequence model without and with attention, self-attention, and attention in different areas such as natural language processing and computer vision. Then, we explain transformers which do not use any recurrence. We explain all the parts of encoder and decoder in the transformer, including positional encoding, multihead self-attention and cross-attention, and masked multihead attention. Thereafter, we introduce the Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT) as the stacks of encoders and decoders of transformer, respectively. We explain their characteristics and how they work.

Publisher

Center for Open Science

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bias in candidate sourcing communication: Investigating stereotypical gender- and age-related frames in online job advertisements at the sectoral level;Public Relations Review;2024-09

2. Systematic exploration and in-depth analysis of ChatGPT architectures progression;Artificial Intelligence Review;2024-08-16

3. Üretken Yapay Zekaya Dayalı Bireysel Emeklilik Bilgilendirme ve Öneri Sistemi;Bilişim Teknolojileri Dergisi;2024-07-31

4. Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges;International Journal of Multimedia Information Retrieval;2024-06-25

5. A Deep Learning-Based Method for Preventing Data Leakage in Electric Power Industrial Internet of Things Business Data Interactions;Sensors;2024-06-22