Affiliation:
1. Center for Complex Systems and Enterprises, Stevens Institute of Technology, Hoboken, NJ 07030, USA
Abstract
Despite the growing capabilities of large language models, concerns exist about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers that can be useful in a variety of industries, especially ones that are “restricted” and have limited data. We consider that bias can occur due to intrinsic model architecture and dataset quality. The two aspects are evaluated using two different metrics we created. We show that our dataset augmentation algorithm reduces bias as measured by our metrics. Our code can be found on an online GitHub repository.
Reference15 articles.
1. Five sources of bias in natural language processing;Hovy;Lang Linguist. Compass,2021
2. Biases in Large Language Models: Origins, Inventory, and Discussion;Navigli;J. Data Inf. Qual.,2023
3. Ethics of large language models in medicine and medical research;Li;Lancet Digit. Health,2023
4. Mikhailov, D. (2023). Optimizing National Security Strategies through LLM-Driven Artificial Intelligence Integration. arXiv.
5. Wiegand, M., Ruppenhofer, J., and Kleinbauer, T. (2019). Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献