Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language
Author:
Hosseini Eghbal A.ORCID, Fedorenko EvelinaORCID
Abstract
AbstractPredicting upcoming events is critical to our ability to effectively interact with our environment and conspecifics. In natural language processing, transformer models, which are trained on next-word prediction, appear to construct a general-purpose representation of language that can support diverse downstream tasks. However, we still lack an understanding of how a predictive objective shapes such representations. Inspired by recent work in vision neuroscience Hénaff et al. (2019), here we test a hypothesis about predictive representations of autoregressive transformer models. In particular, we test whether the neural trajectory of a sequence of words in a sentence becomes progressively more straight as it passes through the layers of the network. The key insight behind this hypothesis is that straighter trajectories should facilitate prediction via linear extrapolation. We quantify straightness using a 1-dimensional curvature metric, and present four findings in support of the trajectory straightening hypothesis: i) In trained models, the curvature progressively decreases from the first to the middle layers of the network. ii) Models that perform better on the next-word prediction objective, including larger models and models trained on larger datasets, exhibit greater decreases in curvature, suggesting that this improved ability to straighten sentence neural trajectories may be the underlying driver of better language modeling performance. iii) Given the same linguistic context, the sequences that are generated by the model have lower curvature than the ground truth (the actual continuations observed in a language corpus), suggesting that the model favors straighter trajectories for making predictions. iv) A consistent relationship holds between the average curvature and the average surprisal of sentences in the middle layers of models, such that sentences with straighter neural trajectories also have lower surprisal. Importantly, untrained models don’t exhibit these behaviors. In tandem, these results support the trajectory straightening hypothesis and provide a possible mechanism for how the geometry of the internal representations of autoregressive models supports next word prediction.
Publisher
Cold Spring Harbor Laboratory
Reference52 articles.
1. Aminabadi, R. Y. , Rajbhandari, S. , Zhang, M. , Awan, A. A. , Li, C. , Li, D. , Zheng, E. , Rasley, J. , Smith, S. , Ruwase, O. , and He, Y. (2022). DeepSpeed inference: Enabling efficient inference of transformer models at unprecedented scale. 2. Bialek, W. , van Steveninck, R. R. d. R. , and Tishby, N. (2007). Efficient representation as a design principle for neural coding and computation. 3. Black, S. , Biderman, S. , Hallahan, E. , Anthony, Q. , Gao, L. , Golding, L. , He, H. , Leahy, C. , McDonell, K. , Phang, J. , Pieler, M. , Prashanth, U. S. , Purohit, S. , Reynolds, L. , Tow, J. , Wang, B. , and Weinbach, S. (2022). GPT-NeoX-20B: An Open-Source autoregressive language model. 4. Brants, T. and Franz, A. (2006). Web 1t 5-gram version 1 LDC2006T13. Web Download. 5. Brown, T. B. , Mann, B. , Ryder, N. , Subbiah, M. , Kaplan, J. , Dhariwal, P. , Neelakantan, A. , Shyam, P. , Sastry, G. , Askell, A. , Agarwal, S. , Herbert-Voss, A. , Krueger, G. , Henighan, T. , Child, R. , Ramesh, A. , Ziegler, D. M. , Wu, J. , Winter, C. , Hesse, C. , Chen, M. , Sigler, E. , Litwin, M. , Gray, S. , Chess, B. , Clark, J. , Berner, C. , McCandlish, S. , Radford, A. , Sutskever, I. , and Amodei, D. (2020). Language models are Few-Shot learners.
|
|