Towards standarized benchmarks of LLMs in software modeling tasks: a conceptual framework-Reference-Cited by-同舟云学术

Towards standarized benchmarks of LLMs in software modeling tasks: a conceptual framework

Published:2024-09-03 Issue: Volume: Page:
ISSN:1619-1366
Container-title:Software and Systems Modeling
language:en
Short-container-title:Softw Syst Model

Author:

Cámara Javier,Burgueño Lola,Troya Javier

Abstract

AbstractThe integration of Large Language Models (LLMs) in software modeling tasks presents both opportunities and challenges. This Expert Voice addresses a significant gap in the evaluation of these models, advocating for the need for standardized benchmarking frameworks. Recognizing the potential variability in prompt strategies, LLM outputs, and solution space, we propose a conceptual framework to assess their quality in software model generation. This framework aims to pave the way for standardization of the benchmarking process, ensuring consistent and objective evaluation of LLMs in software modeling. Our conceptual framework is illustrated using UML class diagrams as a running example.

Funder

Universidad de Málaga

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10270-024-01206-9.pdf

Reference12 articles.

1. Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., Zhang, J.M.: Large language models for software engineering: Survey and open problems (2023)

2. Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., Wang, H.: Large language models for software engineering: A systematic literature review (2023)

3. Cámara, J., Troya, J., Burgueño, L., Vallecillo, A.: On the assessment of generative AI in modeling tasks: an experience report with chatgpt and UML. Softw. Syst. Model. 22(3), 781–793 (2023). https://doi.org/10.1007/S10270-023-01105-5

4. Ozkaya, I.: Application of large language models to software engineering tasks: Opportunities, risks, and implications. IEEE Software 40(3), 4–8 (2023). https://doi.org/10.1109/MS.2023.3248401

5. Austin, J., Odena, A., Nye, M.I., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C.J., Terry, M., Le, Q.V., Sutton, C.: Program synthesis with large language models. CoRR abs/2108.07732, (2021). https://arxiv.org/abs/2108.07732