Author:
Phongwattana Thiptanawat,Chan Jonathan H
Abstract
The rapid growth of biomedical literature presents a significant challenge for researchers to extract and analyze relevant information efficiently. In this study, we explore the application of GPT, the large language model to automate the extraction and visualization of metabolic networks from a corpus of PubMed abstracts. Our objective is to provide a valuable tool for biomedical researchers to explore and understand the intricate metabolic interactions discussed in scientific literature. We begin by splitting a ton of the tokens within the corpus, as the GPT-3.5-Turbo model has a token limit of 4,000 per analysis. Through iterative prompt optimization, we successfully extract a comprehensive list of metabolites, enzymes, and proteins from the abstracts. To validate the accuracy and completeness of the extracted entities, our biomedical data domain experts compare them with the provided abstracts and ensure a fully matched result. Using the extracted entities, we generate a directed graph that represents the metabolic network including 3 types of metabolic events that consist of metabolic consumption, metabolic reaction, and metabolic production. The graph visualization, achieved through Python and NetworkX, offers a clear representation of metabolic pathways, highlighting the relationships between metabolites, enzymes, and proteins. Our approach integrates language models and network analysis, demonstrating the power of combining automated information extraction with sophisticated visualization techniques. The research contributions are twofold. Firstly, we showcase the ability of GPT-3.5-Turbo to automatically extract metabolic entities, streamlining the process of cataloging important components in metabolic research. Secondly, we present the generation and visualization of a directed graph that provides a comprehensive overview of metabolic interactions. This graph serves as a valuable tool for further analysis, comparison with existing pathways, and updating or refining metabolic networks. Our findings underscore the potential of large language models and network analysis techniques in extracting and visualizing metabolic information from scientific literature. This approach enables researchers to gain insights into complex biological systems, advancing our understanding of metabolic pathways and their components.
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Recent Advances in Large Language Models for Healthcare;BioMedInformatics;2024-04-16
2. Analyzing the Future of ChatGPT in Medical Research;Artificial Intelligence Applications Using ChatGPT in Education;2023-09-15