Affiliation:
1. Jiangxi Normal University, School of Computer Information Engineering, China
2. CSIRO’s Data61, Australia
3. Jiangxi University of Technology, China
Abstract
APIs have intricate relations that can be described in text and represented as knowledge graphs to aid software engineering tasks. Existing relation extraction methods have limitations, such as limited API text corpus and affected by the characteristics of the input text. To address these limitations, we propose utilizing large language models (LLMs) (e.g., gpt-3.5) as a neural knowledge base for API relation inference. This approach leverages the entire Web used to pre-train LLMs as a knowledge base and is insensitive to the context and complexity of input texts. To ensure accurate inference, we design an AI chain consisting of three AI modules: API Fully Qualified Name (FQN) Parser, API Knowledge Extractor, and API Relation Decider. The accuracy of the API FQN Parser and API Relation Decider is 0.81 and 0.83, respectively. Using the generative capacity of the LLM and our approach’s inference capability, we achieve an average F1 value of 0.76 under the three datasets, significantly higher than the state-of-the-art method’s average F1 value of 0.40. Compared to the original CoT and modularized CoT methods, our AI chain design has improved the performance of API relation inference by 71% and 49%, respectively. Meanwhile, the prompt ensembling strategy enhances the performance of our approach by 32%. The API relations inferred by our method can be further organized into structured forms to provide support for other software engineering tasks.
Publisher
Association for Computing Machinery (ACM)