ChatGPT usage in the Reactome curation process

Author:

Tiwari KrishnaORCID,Matthews LisaORCID,May Bruce,Shamovsky VeronicaORCID,Orlic-Milacic MarijaORCID,Rothfels KarenORCID,Ragueneau EliotORCID,Gong ChuqiaoORCID,Stephan RalfORCID,Li Nancy,Wu GuanmingORCID,Stein LincolnORCID,D’Eustachio PeterORCID,Hermjakob HenningORCID

Abstract

AbstractAppreciating the rapid advancement and ubiquity of generative AI, particularly ChatGPT, a chatbot using large language models like GPT, we endeavour to explore the potential application of ChatGPT in the data collection and annotation stages within the Reactome curation process. This exploration aimed to create an automated or semi-automated framework to mitigate the extensive manual effort traditionally required for gathering and annotating information pertaining to biological pathways, adopting a Reactome “reaction-centric” approach. In this pilot study, we used ChatGPT/GPT4 to address gaps in the pathway annotation and enrichment in parallel with the conventional manual curation process. This approach facilitated a comparative analysis, where we assessed the outputs generated by ChatGPT against manually extracted information. The primary objective of this comparison was to ascertain the efficiency of integrating ChatGPT or other large language models into the Reactome curation workflow and helping plan our annotation pipeline, ultimately improving our protein-to-pathway association in a reliable and automated or semi-automated way. In the process, we identified some promising capabilities and inherent challenges associated with the utilisation of ChatGPT/GPT4 in general and also specifically in the context of Reactome curation processes. We describe approaches and tools for refining the output given by ChatGPT/GPT4 that aid in generating more accurate and detailed output.

Publisher

Cold Spring Harbor Laboratory

Reference30 articles.

1. ‘What Is Data Curation? Why Is It Important? | Alation’. Accessed: Oct. 19, 2023. [Online]. Available: https://www.alation.com/blog/what-is-data-curation/, https://www.alation.com/blog/what-is-data-curation/

2. Large language models encode clinical knowledge

3. Data curation as anticipatory generification in data infrastructure

4. Curation of Digital Scientific Data

5. Measuring the time spent on data curation

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3