Saturated Transformers are Constant-Depth Threshold Circuits-Reference-Cited by-同舟云学术

Saturated Transformers are Constant-Depth Threshold Circuits

Published:2022 Issue: Volume:10 Page:843-856
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:

Author:

Merrill William¹²,Sabharwal Ashish³,Smith Noah A.⁴⁵

Affiliation:

1. Allen Institute for AI, USA. willm@nyu.edu

2. New York University, USA

3. Allen Institute for AI, USA. ashishs@allenai.org

4. Allen Institute for AI, USA. noah@allenai.org

5. University of Washington, USA

Abstract

Abstract Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with hard attention are quite limited in power (Hahn, 2020), as they can be simulated by constant-depth AND/OR circuits (Hao et al., 2022). However, hard attention is a strong assumption, which may complicate the relevance of these results in practice. In this work, we analyze the circuit complexity of transformers with saturated attention: a generalization of hard attention that more closely captures the attention patterns learnable in practical transformers. We first show that saturated transformers transcend the known limitations of hard-attention transformers. We then prove saturated transformers with floating-point values can be simulated by constant-depth threshold circuits, giving the class TC0 as an upper bound on the formal languages they recognize.

Publisher

MIT Press

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Link

https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00493/2038506/tacl_a_00493.pdf

Reference26 articles.

1. Computational Complexity

2. Layer normalization;Ba;ArXiv,2016

3. On the ability and limitations of transformers to recognize formal languages;Bhattamishra,2020

4. Liliana Cojocaru . 2016. Advanced Studies on the Complexity of Formal Languages. Ph.D. thesis, University of Tampere.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improved Linear Decomposition of Majority and Threshold Boolean Functions;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2023-11

2. A Pragmatic Approach to Syntax Repair;Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity;2023-10-22

3. The Parallelism Tradeoff: Limitations of Log-Precision Transformers;Transactions of the Association for Computational Linguistics;2023

4. Formal Languages and the NLP Black Box;Developments in Language Theory;2023