Abstract
AbstractBioinformatics plays a crucial role in understanding biological phenomena, yet the exponential growth of biological data and rapid technological advancements have heightened the barriers to in-depth exploration of this domain. Thereby, we proposeBio-InformaticsAgent (BIA), an intelligent agent leveraging Large Language Models (LLMs) technology, to facilitate autonomous bioinformatic analysis through natural language. The primary functionalities ofBIAencompass extraction and processing of raw data and metadata, querying both locally deployed and public databases for information. It further undertakes the formulation of workflow designs, generates executable code, and delivers comprehensive reports. Focused on the single-cell RNA sequencing (scRNA-seq) data, this paper demonstratesBIA’s remarkable proficiency in information processing and analysis, as well as executing sophisticated tasks and interactions. Additionally, we analyzed failed executions from the agent and demonstrate prospective enhancement strategies including selfrefinement and domain adaptation. The future outlook includes expandingBIA’s practical implementations across multi-omics data, to alleviating the workload burden for the bioinformatics community and empowering more profound investigations into the mysteries of life sciences.BIAis available at:https://github.com/biagent-dev/biagent.
Publisher
Cold Spring Harbor Laboratory
Reference29 articles.
1. Aaron Kollasch . Large language models for biological prediction and design. PhD thesis, 2024.
2. Large language models in medicine;Nature medicine,2023
3. Prokbert family: genomic language models for microbiome applications;Frontiers in Microbiology,2024
4. Multi-omics data integration, interpretation, and its application;Bioinformatics and biology insights,2020
5. Ali Hakimzadeh , Alejandro Abdala Asbun , Davide Albanese , Maria Bernard , Dominik Buchner , Benjamin Callahan , J Gregory Caporaso , Emily Curd , Christophe Djemiel , Mikael Brand-ström Durling , et al. A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses. Molecular Ecology Resources, 2023.