Abstract
Abstract
Background
We aimed to evaluate the baseline performance and improvement of ChatGPT-4 “omni” (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.
Methods
Using questions from the “Educational Review” articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.
Results
Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (p < 0.001). After training with PDF files, ChatGPT-4o’s accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (p < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (p < 0.001).
Conclusions
The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5’s superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.
Graphical Abstract
Funder
Università degli Studi della Campania Luigi Vanvitelli
Publisher
Springer Science and Business Media LLC
Reference22 articles.
1. Chowdhury GG (2003) Natural language processing. Ann Rev Inf Sci Technol 37:51–89. https://doi.org/10.1002/ARIS.1440370103
2. Minaee S, Mikolov T, Nikzad N et al (2024) Large language models: a survey. ArXiv. https://doi.org/10.48550/arXiv.2402.06196
3. Vaswani A, Brain G, Shazeer N et al (2017) Attention is all you need. In: Guyon I, Von Luxburg U, Bengio S et al (eds) Advances in Neural Information Processing Systems, 31st ed. NIPS, Long Beach, California, USA
4. Liu Y, He H, Han T et al (2024) Understanding LLMs: a comprehensive overview from training to inference. ArXiv. https://doi.org/10.48550/arXiv.2401.02038
5. Hello GPT-4o | OpenAI. https://openai.com/index/hello-gpt-4o/. Accessed 10 Jun 2024
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献