Reward modeling for mitigating toxicity in transformer-based language models-Reference-Cited by-同舟云学术

Reward modeling for mitigating toxicity in transformer-based language models

Published:2022-07-20 Issue:7 Volume:53 Page:8421-8435
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Faal Farshid^ORCID,Schmitt Ketra,Yu Jia Yuan

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10489-022-03944-z.pdf

Reference45 articles.

1. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training

2. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9

3. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

4. Zhang Y, Sun S, Galley M, Chen Y-C, Brockett C, Gao X, Gao J, Liu J, Dolan WB (2020) Dialogpt: Large-scale generative pre-training for conversational response generation. In: Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations, pp 270–278

5. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large language models in psychiatry: Opportunities and challenges;Psychiatry Research;2024-09

2. Don’t Ignore the Drive of Curiosity: Rethinking Subtleties Between Universality of Commonsense Knowledge and Excellence of Large Language Models;SN Computer Science;2024-08-15

3. A Systematic Review of Toxicity in Large Language Models: Definitions, Datasets, Detectors, Detoxification Methods and Challenges;2024-07-15

4. GalaxyGPT: A Hybrid Framework for Large Language Model Safety;IEEE Access;2024

5. A Machine Learning Approach for Moderating Toxic Hinglish Comments of YouTube Videos;Lecture Notes in Networks and Systems;2024