Gender-sensitive word embeddings for healthcare-Reference-Cited by-同舟云学术

Gender-sensitive word embeddings for healthcare

Published:2021-12-16 Issue:3 Volume:29 Page:415-423
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Agmon Shunit¹,Gillis Plia²,Horvitz Eric³,Radinsky Kira¹

Affiliation:

1. Computer Science Faculty, Technion - Israel Institute of Technology, Haifa, Israel

2. Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel

3. Microsoft Research, Redmond, WA, USA

Abstract

Abstract Objective To analyze gender bias in clinical trials, to design an algorithm that mitigates the effects of biases of gender representation on natural-language (NLP) systems trained on text drawn from clinical trials, and to evaluate its performance. Materials and Methods We analyze gender bias in clinical trials described by 16 772 PubMed abstracts (2008–2018). We present a method to augment word embeddings, the core building block of NLP-centric representations, by weighting abstracts by the number of women participants in the trial. We evaluate the resulting gender-sensitive embeddings performance on several clinical prediction tasks: comorbidity classification, hospital length of stay prediction, and intensive care unit (ICU) readmission prediction. Results For female patients, the gender-sensitive model area under the receiver-operator characteristic (AUROC) is 0.86 versus the baseline of 0.81 for comorbidity classification, mean absolute error 4.59 versus the baseline of 4.66 for length of stay prediction, and AUROC 0.69 versus 0.67 for ICU readmission. All results are statistically significant. Discussion Women have been underrepresented in clinical trials. Thus, using the broad clinical trials literature as training data for statistical language models could result in biased models, with deficits in knowledge about women. The method presented enables gender-sensitive use of publications as training data for word embeddings. In experiments, the gender-sensitive embeddings show better performance than baseline embeddings for the clinical tasks studied. The results highlight opportunities for recognizing and addressing gender and other representational biases in the clinical trials literature. Conclusion Addressing representational biases in data for training NLP embeddings can lead to better results on downstream tasks for underrepresented populations.

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

https://academic.oup.com/jamia/article-pdf/29/3/415/42333196/ocab279.pdf

Reference59 articles.

1. Women’s involvement in clinical trials: historical perspective and future implications;Liu;Pharm Pract (Granada),2016

2. Quantifying sex bias in clinical studies at scale with automated data extraction;Feldman;JAMA Netw Open,2019

3. Sex bias in drug research: a call for change;McGregor;Evaluation,2016

4. The more things change, the more they stay the same: a study to evaluate compliance with inclusion and assessment of women and minorities in randomized controlled trials;Geller;Acad Med,2018

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bias, coronavirus, nationality, gender and neurology article citation count prediction with machine learning;Neurology Perspectives;2023-01