BACKGROUND
Natural Language Processing models have wide and growing use in clinical and healthcare domains. Such applications enable scalable, efficient delivery of health information, but they are prone to equity challenges in their effectiveness across demographics and contexts. These models are only as good as the data they are trained on, the type of training, and parameters. Moreover, they are highly sensitive to latent demographic signals such as gender, age, nationality, and native language. Applications with biased components lead to inequitable outcomes. These accessibility challenges are more prevalent in rural regions of the world.
OBJECTIVE
This paper describes and evaluates a novel active learning approach for incrementally improving the accuracy of a Natural Language Processing (NLP), while optimizing for gender-equitable outcomes in healthcare systems. The approach employs an iterative cyclic model, incorporating data annotation using NLP, human auditing to improve the annotation accuracy especially for data with demographic segmentation, testing on new data (with intentional bias favoring underperforming demographics), and a loopback system for retraining the model and applying it on new data.
METHODS
We describe experimental integration of an audit tool and workflow with distinct NLP tasks in two separate contexts: i.) annotation of medical symptoms collected in Hausa and English languages based on responses to a research questionnaire about health access in Northern Nigeria; ii.) message intent classification in English and Swahili languages based on spontaneous user messages to a health guide chatbot in both Nigeria and Kenya.
RESULTS
Baseline results showed an equity gap in both precision (P) and recall (R): p=.725 and r=.676 for the over-reprsented class versus p=.669 and r=.651 for the under-represented class. Application of the active learning tool and workflow mitigated this gap after three increments of auditing and retraining (p=.721 and r=.760 for the under-represented class).
CONCLUSIONS
Our findings indicate that this gender-aware audit workflow is language agnostic and capable of mitigating demographic inequity while improving overall system accuracy.