Framework for evaluating code generation ability of large language models-Reference-Cited by-同舟云学术

Framework for evaluating code generation ability of large language models

Published:2024-02 Issue:1 Volume:46 Page:106-117
ISSN:1225-6463
Container-title:ETRI Journal
language:en
Short-container-title:ETRI Journal

Author:

Yeo Sangyeop¹,Ma Yu‐Seung¹²^ORCID,Kim Sang Cheol²^ORCID,Jun Hyungkook²,Kim Taeho²^ORCID

Affiliation:

1. Division of Artificial Intelligence University of Science and Technology Daejeon Republic of Korea

2. Artificial Intelligence Computing Research Laboratory Electronics and Telecommunications Research Institute Daejeon Republic of Korea

Abstract

AbstractLarge language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, , which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the metric.

Funder

National Research Council of Science and Technology

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.4218/etrij.2023-0357

Reference20 articles.

1. M.Chen J.Tworek H.Jun Q.Yuan H. P.deOliveira Pinto J.Kaplan H.Edwards Y.Burda N.Joseph G.Brockman A.Ray R.Puri G.Krueger M.Petrov H.Khlaaf G.Sastry P.Mishkin B.Chan S.Gray N.Ryder M.Pavlov A.Power L.Kaiser M.Bavarian C.Winter P.Tillet F. P.Such D.Cummings M.Plappert F.Chantzis E.Barnes A.Herbert‐Voss W. H.Guss A.Nichol A.Paino N.Tezak J.Tang I.Babuschkin S.Balaji S.Jain W.Saunders C.Hesse A. N.Carr J.Leike J.Achiam V.Misra E.Morikawa A.Radford M.Knight M.Brundage M.Murati K.Mayer P.Welinder B.McGrew D.Amodei S.McCandlish I.Sutskever andW.Zaremba Evaluating large language models trained on code arXiv preprint 2021. DOI10.48550/arXiv.2107.03374

2. Competition-level code generation with AlphaCode

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Special issue on speech and language AI technologies;ETRI Journal;2024-02