Empirical Analysis of Multi-Task Learning for Reducing Identity Bias in Toxic Comment Detection-Reference-Cited by-同舟云学术

Empirical Analysis of Multi-Task Learning for Reducing Identity Bias in Toxic Comment Detection

Published:2020-05-26 Issue: Volume:14 Page:683-693
ISSN:2334-0770
Container-title:Proceedings of the International AAAI Conference on Web and Social Media
language:
Short-container-title:ICWSM

Author:

Vaidya Ameya,Mai Feng,Ning Yue

Abstract

With the recent rise of toxicity in online conversations on social media platforms, using modern machine learning algorithms for toxic comment detection has become a central focus of many online applications. Researchers and companies have developed a variety of models to identify toxicity in online conversations, reviews, or comments with mixed successes. However, many existing approaches have learned to incorrectly associate non-toxic comments that have certain trigger-words (e.g. gay, lesbian, black, muslim) as a potential source of toxicity. In this paper, we evaluate several state-of-the-art models with the specific focus of reducing model bias towards these commonly-attacked identity groups. We propose a multi-task learning model with an attention layer that jointly learns to predict the toxicity of a comment as well as the identities present in the comments in order to reduce this bias. We then compare our model to an array of shallow and deep-learning models using metrics designed especially to test for unintended model bias within these identity groups.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Empirical Study and Mitigation Methods of Bias in LLM-Based Robots;Academic Journal of Science and Technology;2024-08-20

2. Regional Bias in Monolingual English Language Models;2024-01-24

3. A Deep Learning Framework for Assamese Toxic Comment Detection: Leveraging LSTM and BiLSTM Models with Attention Mechanism;Lecture Notes in Networks and Systems;2024

4. Classification of Toxic Comments on Social Networks Using Machine Learning;Communications in Computer and Information Science;2024

5. Beyond plain toxic: building datasets for detection of flammable topics and inappropriate statements;Language Resources and Evaluation;2023-10-21