Affiliation:
1. Department of Orthopedic Surgery NYU Langone Health New York New York USA
Abstract
AbstractBackgroundLarge language models (LLM) have unknown implications for medical research. This study assessed whether LLM‐generated abstracts are distinguishable from human‐written abstracts and to compare their perceived quality.MethodsThe LLM ChatGPT was used to generate 20 arthroplasty abstracts (AI‐generated) based on full‐text manuscripts, which were compared to originally published abstracts (human‐written). Six blinded orthopaedic surgeons rated abstracts on overall quality, communication, and confidence in the authorship source. Authorship‐confidence scores were compared to a test value representing complete inability to discern authorship.ResultsModestly increased confidence in human authorship was observed for human‐written abstracts compared with AI‐generated abstracts (p = 0.028), though AI‐generated abstract authorship‐confidence scores were statistically consistent with inability to discern authorship (p = 0.999). Overall abstract quality was higher for human‐written abstracts (p = 0.019).ConclusionsAI‐generated abstracts' absolute authorship‐confidence ratings demonstrated difficulty in discerning authorship but did not achieve the perceived quality of human‐written abstracts. Caution is warranted in implementing LLMs into scientific writing.
Reference24 articles.
1. Deep learning, reinforcement learning, and world models
2. ChatGPT General FAQ. April 2023. Accessed.https://help.openai.com/en/articles/6783457‐chatgpt‐general‐faq
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献