BACKGROUND
Generative AI has garnered awareness in the medical field, yet its potential is constrained by inherent limitations. By responding to inputs through predicting the next word from its memory-based archive, we aim to explore some of these constraints from a medical education and psychological perspective, utilizing Bloom’s taxonomy.
OBJECTIVE
To assess AI's cognitive functions in the medical sector by examining its performance through medical licensing exams and applying Bloom's taxonomy.
METHODS
Questions from the Taiwan Medical Licensing Examination (TMLE) (August 2022) and the third step of the United States Medical Licensing Examination (USMLE) (August 2022) were classified based on Bloom's taxonomy levels. The ChatGPT versions were tasked through individual prompts, with questions entered separately into ChatGPT-3.5 and ChatGPT-4 using different accounts. After each response, the chat logs were erased and reset to ensure the independence of each answer. Responses from ChatGPT-3.5 and ChatGPT-4, collected between January and February 2024, were analyzed. The questions from both exams were available online during the study period.
RESULTS
Although the overall performance of ChatGPT-4 surpassed that of ChatGPT-3.5, the analysis of responses from both models across various cognitive levels revealed no significant correlation between their performance and the levels of Bloom's taxonomy. This lack of significance persisted even when considering the strength of ChatGPTs in their extensive databases classified under "remember," compared to other cognitive levels labeled as "non-remember."
CONCLUSIONS
In the medical field, ChatGPT models may utilize their "remember" function to answer all types of questions across all categories defined by Bloom's taxonomy. Further research is required focusing on different versions, medical specialties, and the level of difficulty assessed by individuals from various backgrounds.