The risk of racial bias while tracking influenza-related content on social media using machine learning-Reference-Cited by-同舟云学术

The risk of racial bias while tracking influenza-related content on social media using machine learning

Published:2021-01-23 Issue:4 Volume:28 Page:839-849
ISSN:1527-974X
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Lwowski Brandon¹,Rios Anthony¹

Affiliation:

1. Department of Information Systems and Cyber Security, University of Texas at San Antonio, San Antonio, Texas, USA

Abstract

Abstract Objective Machine learning is used to understand and track influenza-related content on social media. Because these systems are used at scale, they have the potential to adversely impact the people they are built to help. In this study, we explore the biases of different machine learning methods for the specific task of detecting influenza-related content. We compare the performance of each model on tweets written in Standard American English (SAE) vs African American English (AAE). Materials and Methods Two influenza-related datasets are used to train 3 text classification models (support vector machine, convolutional neural network, bidirectional long short-term memory) with different feature sets. The datasets match real-world scenarios in which there is a large imbalance between SAE and AAE examples. The number of AAE examples for each class ranges from 2% to 5% in both datasets. We also evaluate each model's performance using a balanced dataset via undersampling. Results We find that all of the tested machine learning methods are biased on both datasets. The difference in false positive rates between SAE and AAE examples ranges from 0.01 to 0.35. The difference in the false negative rates ranges from 0.01 to 0.23. We also find that the neural network methods generally has more unfair results than the linear support vector machine on the chosen datasets. Conclusions The models that result in the most unfair predictions may vary from dataset to dataset. Practitioners should be aware of the potential harms related to applying machine learning to health-related social media data. At a minimum, we recommend evaluating fairness along with traditional evaluation metrics.

Funder

National Science Foundation

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

https://academic.oup.com/jamia/article-pdf/28/4/839/36642084/ocaa326.pdf

Reference65 articles.

1. Quantifying sars-CoV-2 transmission suggests epidemic control with digital contact tracing;Ferretti;Science,2020

2. COVID-19 mobile positioning data contact tracing and patient privacy regulations: exploratory search of global response strategies and the use of digital tools in Nigeria;Ekong;JMIR Mhealth Uhealth,2020

3. Influenza a (h7n9) and the importance of digital epidemiology;Salathé;N Engl J Med,2013

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Analytics and intelligence for public health surveillance;Modernizing Global Health Security to Prevent, Detect, and Respond;2024

2. Codified Racism in Digital Health Platforms A Meta-Analysis of COVID-19 Prediction Algorithms and their Policy Implications;2023-09-25

3. Resampling for Mitigating Bias in Predictive Model for Substance Use Disorder Treatment Completion;2023 IEEE 11th International Conference on Healthcare Informatics (ICHI);2023-06-26

4. Automatic Push System for New Media Information Dissemination based on Neural Network Algorithm;2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC);2023-06-16

5. A Comparative Study of Fairness in Medical Machine Learning;2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI);2023-04-18