1. Daman Arora, Himanshu Gaurav Singh, 2023. Have llms advanced enough? a challenging problem solving benchmark for large language models. arXiv preprint arXiv:2305.15074 (2023).
2. Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, 2021. Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021).
3. Baidu. 2024. ERNIE Bot. https://yiyan.baidu.com/welcome. (Accessed on 05/18/2024).
4. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
5. An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications