Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?-Reference-Cited by-同舟云学术

Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?

Published:2024-07-12 Issue:FSE Volume:1 Page:2261-2284
ISSN:2994-970X
Container-title:Proceedings of the ACM on Software Engineering
language:en
Short-container-title:Proc. ACM Softw. Eng.

Author:

Kou Bonan¹^ORCID,Chen Shengmai¹^ORCID,Wang Zhijie²^ORCID,Ma Lei³^ORCID,Zhang Tianyi¹^ORCID

Affiliation:

1. Purdue University, West Lafayette, USA

2. University of Alberta, Edmonton, Canada

3. The University of Tokyo, Tokyo, Japan / University of Alberta, Edmonton, Canada

Abstract

Large Language Models (LLMs) have recently been widely used for code generation. Due to the complexity and opacity of LLMs, little is known about how these models generate code. We made the first attempt to bridge this knowledge gap by investigating whether LLMs attend to the same parts of a task description as human programmers during code generation. An analysis of six LLMs, including GPT-4, on two popular code generation benchmarks revealed a consistent misalignment between LLMs' and programmers' attention. We manually analyzed 211 incorrect code snippets and found five attention patterns that can be used to explain many code generation errors. Finally, a user study showed that model attention computed by a perturbation-based method is often favored by human programmers. Our findings highlight the need for human-aligned LLMs for better interpretability and programmer trust.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3660807

Reference103 articles.

1. 2022. CodeParrot. https://github.com/huggingface/transformers/tree/main/examples/research_projects/codeparrot

2. 2023. ChatGPT. http://chat.openai.com

3. 2023. GPT-4 Parameters: Unlimited guide NLP’s Game-Changer. https://medium.com/@mlubbad/the-ultimate-guide-to-gpt-4-parameters-everything-you-need-to-know-about-nlps-game-changer-109b8767855a

4. Unified Pre-training for Program Understanding and Generation

5. Alex Andonian and Quentin Anthony. 2021. GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch. https://doi.org/10.5281/zenodo.5879544 10.5281/zenodo.5879544