Abstract
ABSTRACTGenerative Pre-trained Transformers (GPT) are powerful language models that have great potential to transform biomedical research. However, they are known to suffer from artificial hallucinations and provide false answers that are seemingly correct in some situations. We developed GeneTuring, a comprehensive QA database with 600 questions in genomics, and manually scored 10,800 answers returned by six GPT models, including GPT-3, ChatGPT, and New Bing. New Bing has the best overall performance and significantly reduces the level of AI hallucination compared to other models, thanks to its ability to recognize its incapacity in answering questions. We argue that improving incapacity awareness is equally important as improving model accuracy to address AI hallucination.
Publisher
Cold Spring Harbor Laboratory
Reference19 articles.
1. Language models are unsupervised multitask learners;OpenAI blog,2019
2. Luo, R. et al. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings Bioinforma. 23 (2022).
3. Venigalla, A. , Frankle, J. & Carbin, M. Biomedlm: a domain-specific large language model for biomedical text. https://www.mosaicml.com/blog/introducing-pubmed-gpt.
4. Language models are few-shot learners;Adv. neural information processing systems,2020
5. ChatGPT: five priorities for research
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献