BCGen: a comment generation method for bytecode-Reference-Cited by-同舟云学术

BCGen: a comment generation method for bytecode

Published:2022-12-10 Issue:1 Volume:30 Page:
ISSN:0928-8910
Container-title:Automated Software Engineering
language:en
Short-container-title:Autom Softw Eng

Author:

Huang Yuan,Huang Jinbo,Chen Xiangping,He Kunning,Zhou Xiaocong

Abstract

AbstractBytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecode is even harder to understand for programmers and researchers. Bytecode has been widely used in various software tasks such as malware detection and clone detection. In order to understand the meaning of the bytecode more quickly and accurately and further help programmers in more software activities, we propose a bytecode comment generation method (called BCGen) using neural language model. Specifically, to get the structured information of the bytecode, we first generate the control flow graph (CFG) of the bytecode, and serialize the CFG with bytecode semantic information. Then a transformer model combining gate recurrent unit is proposed to learn the features of bytecode to generate comments. We obtain the bytecode by building the Jar packages of the well-known open-source projects in the Maven repository and construct a bytecode dataset to train and evaluate our model. Experimental results show that the BLEU of BCGen can reach 0.26, which outperforms several baselines and proves the effectiveness and practicability of our method. It is concluded that it is possible to generate natural language comments directly from the bytecode. Meanwhile, it is important to take structured and semantic information into account in generating bytecode comments.

Publisher

Springer Science and Business Media LLC

Subject

Software

Link

https://link.springer.com/content/pdf/10.1007/s10515-022-00374-6.pdf

Reference62 articles.

1. Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017)

2. Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: International conference on machine learning, pp. 2091–2100. PMLR (2016)

3. Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018)

4. Banerjee, S., Lavie, A.: Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)

5. Chen, B., Cherry, C.: A systematic comparison of smoothing techniques for sentence-level bleu. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 362–367 (2014)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generating Python Mutants From Bug Fixes Using Neural Machine Translation;IEEE Access;2023