Affiliation:
1. Department of Computer Software Engineering, National University of Sciences and Technology, Islamabad 44000, Pakistan
Abstract
To enhance the software implementation process, developers frequently leverage preexisting code snippets by exploring an extensive codebase. Existing code search tools often rely on keyword- or syntactic-based methods and struggle to fully grasp the semantics and intent behind code snippets. In this paper, we propose a novel hybrid C2B model that combines CodeT5 and bidirectional long short-term memory (Bi-LSTM) for source code search and recommendation. Our proposed C2B hybrid model leverages CodeT5’s domain-specific pretraining and Bi-LSTM’s contextual understanding to improve code representation and capture sequential dependencies. As a proof-of-concept application, we implemented the proposed C2B hybrid model as a deep neural code search tool and empirically evaluated the model on the large-scale dataset of CodeSearchNet. The experimental findings showcase that our methodology proficiently retrieves pertinent code snippets and surpasses the performance of prior state-of-the-art techniques.
Reference95 articles.
1. On the value of project productivity for early effort estimation;Azzeh;Sci. Comput. Program.,2022
2. Ling, C., Lin, Z., Zou, Y., and Xie, B. (2020, January 13–15). Adaptive deep code search. Proceedings of the 28th International Conference on Program Comprehension, Seoul, Republic of Korea.
3. A survey on machine learning techniques applied to source code;Sharma;J. Syst. Softw.,2024
4. Reusable Component Retrieval: A Semantic Search Approach for Low-Resource Languages;Bibi;ACM Trans. Asian Low-Resour. Lang. Inf. Process.,2023
5. Query expansion based on crowd knowledge for code search;Nie;IEEE Trans. Serv. Comput.,2016