Data Science Using OpenAI: Testing Their New Capabilities Focused on Data Science

Author:

Guerra Pires JorgeORCID

Abstract

Introduction: Despite the ubiquity of statistics in numerous academic disciplines, including life sciences, many researchers–who are not statistically trained–struggle with the correct application of statistical analysis, leading to fundamental errors in their work. The complexity and importance of statistics in scientific research necessitate a tool that empowers researchers from various backgrounds to conduct sound statistical analysis without being experts in the field. This paper introduces and evaluates the potential of OpenAI's latest API, known as the "coder interpreter," to fulfill this need. Methods: The coder interpreter API is designed to comprehend human commands, process CSV data files, and perform statistical analyses by intelligently selecting appropriate methods and libraries. Unlike traditional statistical software, this API simplifies the analysis process by requiring minimal input from the user—often just a straightforward question or command. Our work involved testing the API with actual datasets to demonstrate its capabilities, focusing on ease of use for non-statisticians and investigating its potential to improve research output, particularly in evidence-based medicine. Results: The coder interpreter API effectively utilized open-source Python libraries, renowned for their extensive resources in data science, to accurately execute statistical analyses on provided datasets. Practical examples, including a study involving diabetic patients, showcased the API's proficiency in aiding non-expert researchers in interpreting and utilizing data for their research. Discussion: Integrating AI-based tools such as OpenAI's coder interpreter API into the research process can revolutionize how scientific data is analyzed. By reducing the barrier to conducting advanced statistics, it enables researchers—including those in fields where practitioners are often concurrently medical doctors, such as in evidence-based medicine—to focus on substantive research questions. This paper highlights the potential for these tools to be adopted broadly by both novices and experts alike, thereby improving the overall quality of statistical analysis in scientific research. We advocate for the wider implementation of this technology as a step towards democratizing access to sophisticated statistical inference and data analysis capabilities.

Publisher

Qeios Ltd

Reference37 articles.

1. HAO, K. The chaos inside OpenAI – Sam Altman, Elon Musk, and existential risk explained — Karen Hao. Big Think [YouTube Channel], 2023. Accessed on 2 Dec 2023. Disponível em: ⟨https://www.youtube.com/watch?v=O9sLCp2Jq74&t=3333s⟩.

2. WOLFRAM, S. What Is ChatGPT Doing... and Why Does It Work? 2023. https://writings.stephenwolfram.com/2023/02/whatis-chatgpt-doing-and-why-does-it-work/.

3. PIRES, J. G. O mercado da criatividade: Regulamentação da profissão de pesquisador acadêmico e científico no Brasil. 2023. Disponível em: ⟨https://www.amazon.com.br/mercado-criatividade-Regulamenta% C3%A7%C3%A3o-pesquisador-cient%C3%ADfico-ebook/dp/B09TKRX5PW⟩.

4. BULLARD, K. M. et al. Prevalence of diagnosed diabetes in adults by diabetes type — united states, 2016. Morbidity and Mortality Weekly Report, US Department of Health and Human Services, Centers for Disease Control and Prevention, v. 67, n. 12, p. 359, 2018.

5. TITUS, A. J. Nhanes-gpt: Large language models (llms) and the future of biostatistics. medRxiv, Cold Spring Harbor Laboratory Press, 2023. Disponível em: ⟨https://www.medrxiv.org/content/early/2023/12/15/2023.12.13.23299830⟩.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3