Interpretable Bias Mitigation for Textual Data: Reducing Genderization in Patient Notes While Maintaining Classification Performance-Reference-Cited by-同舟云学术

Interpretable Bias Mitigation for Textual Data: Reducing Genderization in Patient Notes While Maintaining Classification Performance

Published:2022-10-31 Issue:4 Volume:3 Page:1-41
ISSN:2691-1957
Container-title:ACM Transactions on Computing for Healthcare
language:en
Short-container-title:ACM Trans. Comput. Healthcare

Author:

Minot Joshua R.¹,Cheney Nicholas¹,Maier Marc²,Elbers Danne C.³,Danforth Christopher M.¹,Dodds Peter Sheridan¹

Affiliation:

1. University of Vermont, Burlington, VT, USA

2. MassMutual, MA, USA

3. University of Vermont, Burlington, VT and VA Cooperative Studies Program, VA Boston Healthcare System, USA

Abstract

Medical systems in general, and patient treatment decisions and outcomes in particular, can be affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language models—statistical estimates of the relationships between concepts derived from distant reading of corpora. Building on this work, we investigate how differences in gender-specific word frequency distributions and language models interact with regards to bias. We identify and remove gendered language from two clinical-note datasets and describe a new debiasing procedure using BERT-based gender classifiers. We show minimal degradation in health condition classification tasks for low- to medium-levels of dataset bias removal via data augmentation. Finally, we compare the bias semantically encoded in the language models with the bias empirically observed in health records. This work outlines an interpretable approach for using data augmentation to identify and reduce biases in natural language processing pipelines.

Funder

Vermont Advanced Computing Core and financial support from the Massachusetts Mutual Life Insurance Company and Google

Publisher

Association for Computing Machinery (ACM)

Subject

Health Information Management,Health Informatics,Computer Science Applications,Biomedical Engineering,Information Systems,Medicine (miscellaneous),Software

Link

https://dl.acm.org/doi/pdf/10.1145/3524887

Reference71 articles.

1. Privacy guarantees for de-identifying text transformations;Adelani David Ifeoluwa;arXiv preprint arXiv:2008.03101,2020

2. Sex differences in treatments, relative survival, and excess mortality following acute myocardial infarction: National cohort study using the SWEDEHEART registry;Alabas Oras A.;Journal of the American Heart Association,2017

3. Marcella Alsan, Owen Garrick, and Grant C. Graziani. 2018. Does Diversity Matter for Health? Experimental Evidence from Oakland. Technical Report. National Bureau of Economic Research.

4. Tuskegee and the health of black men;Alsan Marcella;The Quarterly Journal of Economics,2018

5. Publicly Available Clinical

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Measuring and Mitigating Gender Bias in Legal Contextualized Language Models;ACM Transactions on Knowledge Discovery from Data;2023-10-18

2. Allotaxonometry and rank-turbulence divergence: a universal instrument for comparing complex systems;EPJ Data Science;2023-09-19

3. Blinding to Circumvent Human Biases: Deliberate Ignorance in Humans, Institutions, and Machines;Perspectives on Psychological Science;2023-09-05

4. Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts;2023 IEEE World Conference on Applied Intelligence and Computing (AIC);2023-07-29

5. Artificial intelligence bias in medical system designs: a systematic review;Multimedia Tools and Applications;2023-07-22