BACKGROUND
Individuals from minoritized racial and ethnic backgrounds suffer from pernicious and pervasive health disparities that have emerged, in part, from clinician bias.
OBJECTIVE
We used a natural language processing approach examine to whether linguistic markers in electronic health record (EHR) notes differ, based on the race and ethnicity of the patient. To validate this approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias.
METHODS
In this cross-sectional study, we extracted EHR notes for patients 18 years of age or older who were diagnosed with type 2 diabetes and received care from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics between 2006 and 2015. Race and ethnicity of patients were defined as ‘White Non-Hispanic,’ ‘Black Non-Hispanic,’ or ‘Hispanic/Latino’. We hypothesize that SEANCE (Sentiment Analysis and Social Cognition Engine) components (i.e., negative adjectives, positive adjectives, joy, fear and disgust, politics, respect, trust verbs, well-being) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which (on a scale of 1 to 10 with 10 being extremely indicative of bias) they thought variation in the use of SÉANCE language domains for different racial and ethnic groups were reflective of bias in EHR notes.
RESULTS
We examined EHR notes (n = 12,905) of Black Non-Hispanic, White Non-Hispanic, and Hispanic/Latino patients (n = 1,562), who were seen by 281 physicians. Twenty-seven clinicians participated in the validation study. Participants rated negative adjectives as 8.63 (SD=2.06), fear and disgust as 8.11 (SD=2.15), and positive adjectives as 7.93 (SD=2.46). Notes for Black Non-Hispanic patients contained significantly more negative adjectives (coeff=0.07, SE=0.02) and significantly more fear and disgust words (coeff=0.007, SE=0.002) compared to the notes for White Non-Hispanic patients. The notes for Hispanic/Latino patients included significantly fewer positive adjectives (coeff=-0.02, SE=0.007), trust verbs (coeff=-0.009, SE=0.004), and joy words (coeff=-0.03, SE=0.01) compared to the notes for White Non-Hispanic patients.
CONCLUSIONS
If validated, this approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.