BACKGROUND
In medical education, particularly in anatomy and dermatology, generative artificial intelligence (AI) can be used to create customized illustrations. However, the underrepresentation of darker skin tones in medical textbooks and elsewhere, which serve as training data for AI, poses a significant challenge in ensuring diverse and inclusive educational materials.
OBJECTIVE
This study aims to evaluate the extent of skin tone diversity in AI-generated medical images and to test whether the representation of skin tones can be improved by modifying AI prompts to better reflect the demographic makeup of the US population.
METHODS
Two standard AI models (Dall-E and Midjourney) each generated 100 images of people with psoriasis. Additionally, a custom model was developed which incorporated a prompt injection aimed at “forcing” the AI (Dall-E 3) to reflect the skin tone distribution of the US population according to the 2012 American National Election Survey. This custom model generated another set of 100 images. The skin tones in these images were assessed by three researchers using the New Immigrant Survey skin tone scale, with the median value representing each image. A Chi-Square Goodness of Fit analysis compared the skin tone distributions from each set of images to that of the US population.
RESULTS
The standard AI models (Dalle-3 and Midjourney) demonstrated a significant difference between the expected skin tones of the US population and the observed tones in the generated images (P=8.62E-11 and P=1.12E-21 respectively). Both standard AI models over-represented lighter skin. Conversely, the custom model with the modified prompt yielded a distribution of skin tones that closely matched the expected demographic representation, showing no significant difference (P=0.0435).
CONCLUSIONS
This study reveals a notable bias in AI-generated medical images, predominantly underrepresenting darker skin tones. This bias can be effectively addressed by modifying AI prompts to incorporate real-life demographic distributions. The findings emphasize the need for conscious efforts in AI development to ensure diverse and representative outputs, particularly in educational and medical contexts. Users of generative AI tools should be aware that these biases exist, and that similar tendencies may also exist in other types of generative AI (e.g. large language models) and in other characteristics (e.g. sex/gender, culture/ethnicity). Injecting demographic data into AI prompts can effectively counteract these biases, ensuring a more accurate representation of the general population.