Affiliation:
1. Division of Artificial Intelligence University of Science and Technology Daejeon Republic of Korea
2. Artificial Intelligence Computing Research Laboratory Electronics and Telecommunications Research Institute Daejeon Republic of Korea
Abstract
AbstractLarge language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric,
, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the
metric.
Funder
National Research Council of Science and Technology
Reference20 articles.
1. M.Chen J.Tworek H.Jun Q.Yuan H. P.deOliveira Pinto J.Kaplan H.Edwards Y.Burda N.Joseph G.Brockman A.Ray R.Puri G.Krueger M.Petrov H.Khlaaf G.Sastry P.Mishkin B.Chan S.Gray N.Ryder M.Pavlov A.Power L.Kaiser M.Bavarian C.Winter P.Tillet F. P.Such D.Cummings M.Plappert F.Chantzis E.Barnes A.Herbert‐Voss W. H.Guss A.Nichol A.Paino N.Tezak J.Tang I.Babuschkin S.Balaji S.Jain W.Saunders C.Hesse A. N.Carr J.Leike J.Achiam V.Misra E.Morikawa A.Radford M.Knight M.Brundage M.Murati K.Mayer P.Welinder B.McGrew D.Amodei S.McCandlish I.Sutskever andW.Zaremba Evaluating large language models trained on code arXiv preprint 2021. DOI10.48550/arXiv.2107.03374
2. Competition-level code generation with AlphaCode
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献