Clinical Research With Large Language Models Generated Writing—Clinical Research with AI-assisted Writing (CRAW) Study

Author:

Huespe Ivan A.12,Echeverri Jorge3,Khalid Aisha4,Carboni Bisso Indalecio1,Musso Carlos G.15,Surani Salim67,Bansal Vikas6,Kashyap Rahul68

Affiliation:

1. Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.

2. Universidad de Buenos Aires, Buenos Aires, Argentina.

3. Universidad Javeriana, Bogotá, Colombia.

4. Harvard Medical School, Boston, MA.

5. Facultad de Ciencias de la Salud, Universidad Simon Bolivar, Barranquilla, Colombia.

6. Mayo Clinic, Rochester, MN.

7. Texas A&M University, College Station, TX.

8. WellSpan Health, York, PA.

Abstract

IMPORTANCE: The scientific community debates Generative Pre-trained Transformer (GPT)-3.5’s article quality, authorship merit, originality, and ethical use in scientific writing. OBJECTIVES: Assess GPT-3.5’s ability to craft the background section of critical care clinical research questions compared to medical researchers with H-indices of 22 and 13. DESIGN: Observational cross-sectional study. SETTING: Researchers from 20 countries from six continents evaluated the backgrounds. PARTICIPANTS: Researchers with a Scopus index greater than 1 were included. MAIN OUTCOMES AND MEASURES: In this study, we generated a background section of a critical care clinical research question on “acute kidney injury in sepsis” using three different methods: researcher with H-index greater than 20, researcher with H-index greater than 10, and GPT-3.5. The three background sections were presented in a blinded survey to researchers with an H-index range between 1 and 96. First, the researchers evaluated the main components of the background using a 5-point Likert scale. Second, they were asked to identify which background was written by humans only or with large language model-generated tools. RESULTS: A total of 80 researchers completed the survey. The median H-index was 3 (interquartile range, 1–7.25) and most (36%) researchers were from the Critical Care specialty. When compared with researchers with an H-index of 22 and 13, GPT-3.5 was marked high on the Likert scale ranking on main background components (median 4.5 vs. 3.82 vs. 3.6 vs. 4.5, respectively; p < 0.001). The sensitivity and specificity to detect researchers writing versus GPT-3.5 writing were poor, 22.4% and 57.6%, respectively. CONCLUSIONS AND RELEVANCE: GPT-3.5 could create background research content indistinguishable from the writing of a medical researcher. It was marked higher compared with medical researchers with an H-index of 22 and 13 in writing the background section of a critical care clinical research question.

Funder

Ben Barres Spotlight Awards from eLIFE community

Publisher

Ovid Technologies (Wolters Kluwer Health)

Subject

Critical Care and Intensive Care Medicine

Reference21 articles.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3