Artificial intelligence generated clinical score sheets: looking at the two faces of Janus-Reference-Cited by-同舟云学术

Artificial intelligence generated clinical score sheets: looking at the two faces of Janus

Published:2024-05-16 Issue:1 Volume:40 Page:
ISSN:2233-7660
Container-title:Laboratory Animal Research
language:en
Short-container-title:Lab Anim Res

Author:

Berce Cristian^ORCID

Abstract

AbstractIn vivo experiments are increasingly using clinical score sheets to ensure minimal distress to the animals. A score sheet is a document that includes a list of specific symptoms, behaviours and intervention guidelines, all balanced to for an objective clinical assessment of experimental animals. Artificial Intelligence (AI) technologies are increasingly being applied in the field of preclinical research, not only in analysis but also in documentation processes, reflecting a significant shift towards more technologically advanced research methodologies. The present study explores the application of Large Language Models (LLM) in generating score sheets for an animal welfare assessment in a preclinical research setting. Focusing on a mouse model of inflammatory bowel disease, the study evaluates the performance of three LLM – ChatGPT-4, ChatGPT-3.5, and Google Bard – in creating clinical score sheets based on specified criteria such as weight loss, stool consistency, and visible fecal blood. Key parameters evaluated include the consistency of structure, accuracy in representing severity levels, and appropriateness of intervention thresholds. The findings reveal a duality in LLM-generated score sheets: while some LLM consistently structure their outputs effectively, all models exhibit notable variations in assigning numerical values to symptoms and defining intervention thresholds accurately. This emphasizes the dual nature of AI performance in this field—its potential to create useful foundational drafts and the critical need for professional review to ensure precision and reliability. The results highlight the significance of balancing AI-generated tools with expert oversight in preclinical research.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s42826-024-00206-6.pdf

Reference21 articles.

1. Bugnon P, Heimann M, Thallmair M. What the literature tells us about score sheet design. Lab Anim. 2016;50(6):414–7.

2. van Fentener JM, Borrens M, Girod A, Lelovas P, Morrison F, Torres YS. The reporting of clinical signs in laboratory animals: FELASA Working Group Report. Lab Anim. 2015;49(4):267–83.

3. Kunitsu Y. The potential of GPT-4 as a Support Tool for pharmacists: Analytical Study using the Japanese National Examination for pharmacists. JMIR Med Educ. 2023;9:e48452.

4. Schueller SM, Morris RR. Clinical science and practice in the age of large language models and generative artificial intelligence. J Consult Clin Psychol. 2023;91(10):559–61.

5. Birhane A, Kasirzadeh A, Leslie D, Wachter S. Science in the age of large language models. Nat Rev Phys. 2023;5:277–80.