Author:
CHENG Yu,HUANG Guanming,WU Yishun,ZHAO Zijie,HE Zhenhao,LU Jiaxing
Abstract
Inferring the fully qualified names (FQNs) of undeclared receiving objects and non-fully-qualified type names (non-FQNs) in partial code is critical for effectively searching, understanding, and reusing partial code. Existing type inference tools, such as COSTER and SNR, rely on a symbolic knowledge base and adopt a dictionary-lookup strategy to map simple names of undeclared receiving objects and non-FQNs to FQNs. However, building a symbolic knowledge base requires parsing compilable code files, which limits the collection of APIs and code contexts, resulting in out-of-vocabulary (OOV) failures. To overcome the limitations of a symbolic knowledge base for FQN inference, we implemented Ask Me Any Type (AMAT), a type of inference plugin embedded in web browsers and integrated development environment (IDE). Unlike the dictionary-lookup strategy, AMAT uses a cloze-style fill-in-the-blank strategy for type inference. By treating code as text, AMAT leverages a fine-tuned large language model (LLM) as a neural knowledge base, thereby preventing the need for code compilation. Experimental results show that AMAT outperforms state-of-the-art tools such as COSTER and SNR. In practice, developers can directly reuse partial code by inferring the FQNs of unresolved type names in real time.
Reference12 articles.
1. Gupta P, Mehrotra N, Purandare R. JCoffee: Using compiler feedback to make partial code snippets compilable[C]//2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). New York: IEEE, 2020: 810-813.
2. Thummalapenta S, Xie T. Parseweb: A programmer assistant for reusing open source code on the web[C]//Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. New York: ACM, 2007: 204-213 .
3. Zhou Y Q, Liu S Q, Siow J K, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc, 2019:10197-10207.
4. Phan H, Nguyen H A, Tran N M, et al. Statistical learning of API fully qualified names in code snippets of online forums[C]//Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018: 632-642 .
5. Khaled Saifullah C M, Asaduzzaman M, Roy C K. Learning from examples to find fully qualified names of API elements in code snippets[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). New York: IEEE, 2019: 243-254.