Statement-Grained Hierarchy Enhanced Code Summarization
-
Published:2024-02-15
Issue:4
Volume:13
Page:765
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Zhang Qianjin1ORCID, Jin Dahai1, Wang Yawen1, Gong Yunzhan1
Affiliation:
1. State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
Abstract
Code summarization plays a vital role in aiding developers with program comprehension by generating corresponding textual descriptions for code snippets. While recent approaches have concentrated on encoding the textual and structural characteristics of source code, they often neglect the global hierarchical features, causing limited code representation. Addressing this gap, our paper introduces the statement-grained hierarchy enhanced Transformer model (SHT), a novel framework that integrates global hierarchy, syntax, and token sequences to automatically generate summaries for code snippets. SHT is distinctively designed with two encoders to learn both hierarchical and sequential features of code. One relational attention encoder processes the statement-grained hierarchical graph, producing hierarchical embeddings. Subsequently, another sequence encoder integrates these hierarchical structures with token sequences. The resulting enriched representation is then fed into a vanilla Transformer decoder, which effectively generates concise and informative summarizations. Our extensive experiments demonstrate that SHT significantly outperforms state-of-the-art approaches on two widely used Java benchmarks. This underscores the effectiveness of incorporating global hierarchical information in enhancing the quality of code summarizations.
Reference37 articles.
1. Wan, Y., Zhao, Z., Yang, M., Xu, G., Ying, H., Wu, J., and Yu, P.S. (2018, January 3–7). Improving automatic source code summarization via deep reinforcement learning. Proceedings of the Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France. 2. Measuring program comprehension: A large-scale field study with professionals;Xia;IEEE Trans. Softw. Eng.,2017 3. Stapleton, S., Gambhir, Y., LeClair, A., Eberhart, Z., Weimer, W., Leach, K., and Huang, Y. (2020, January 13–15). A human study of comprehension and code summarization. Proceedings of the 28th International Conference on Program Comprehension, Seoul, Republic of Korea. 4. Liu, S., Chen, Y., Xie, X., Siow, J., and Liu, Y. (2020). Retrieval-augmented generation for code summarization via hybrid gnn. arXiv. 5. Iyer, S., Konstas, I., Cheung, A., and Zettlemoyer, L. (2016, January 7–12). Summarizing source code using a neural attention model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.
|
|