1. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models;Wanjun Zhong;arXiv,2023
2. Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too;Jose ; Hernandez-Orallo;Journal of Logic, Language and Information,2000
3. Are Emergent Abilities of Large Language Models a Mirage?;Rylan Schaeffer;arXiv,2023