A Testing Framework for AI Linguistic Systems (testFAILS)-Reference-Cited by-同舟云学术

A Testing Framework for AI Linguistic Systems (testFAILS)

Published:2023-07-17 Issue:14 Volume:12 Page:3095
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Kumar Yulia¹^ORCID,Morreale Patricia¹^ORCID,Sorial Peter¹,Delgado Justin¹,Li J. Jenny¹,Martins Patrick¹

Affiliation:

1. Department of Computer Science and Technology, Kean University, Union, NJ 07083, USA

Abstract

This paper presents an innovative testing framework, testFAILS, designed for the rigorous evaluation of AI Linguistic Systems (AILS), with particular emphasis on the various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, “How should AI be evaluated?” While the Turing test has traditionally been the benchmark for AI evaluation, it is argued that current, publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing-test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Ongoing research has already validated several versions of ChatGPT, and comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA and PaLM 2 models, is currently being conducted. The testFAILS framework is designed to be adaptable, ready to evaluate new chatbot versions as they are released. Additionally, available chatbot APIs have been tested and applications have been developed, one of them being AIDoctor, presented in this paper, which utilizes the ChatGPT-4 model and Microsoft Azure AI technologies.

Funder

NSF

Kean University

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/14/3095/pdf

Reference68 articles.

1. Use chat gpt to solve programming bugs;Surameery;Int. J. Inf. Technol. Comput. Eng. (IJITC),2023

2. Google Bard Generated Literature Review: Metaverse;J. AI,2023

3. Lopezosa, C. (2023). Bing chat: Hacia una nueva forma de entender las búsquedas. Anuario ThinkEPI, 17.

4. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., and Azhar, F. (2023). Llama: Open and efficient foundation language models. arXiv.

5. Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., and Chen, Z. (2023). Palm 2 technical report. arXiv.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Applying Swin Architecture to Diverse Sign Language Datasets;Electronics;2024-04-16

2. Robust Testing of AI Language Model Resiliency with Novel Adversarial Prompts;Electronics;2024-02-22

3. ChatGPT Translation of Program Code for Image Sketch Abstraction;Applied Sciences;2024-01-24

4. Transformers and LLMs as the New Benchmark in Early Cancer Detection;ITM Web of Conferences;2024

5. Love the Way You Lie: Unmasking the Deceptions of LLMs;2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C);2023-10-22