1. CodeX GLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation;Shuai;arXiv,2021
2. A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends;Zibin;arXiv,2024
3. BLEU
4. Code-BLEU: A Method for Automatic Evaluation of Code Synthesis;Shuo;arXiv,2020