Abstract
AbstractAdvancements in high-throughput sequencing technologies and artificial intelligence offer unprecedented opportunities for groundbreaking discoveries in bioinformatics research. However, the challenges of exponential growth of omics data and the rapid development of artificial intelligence technologies require automated big biological data analysis capability and interdisciplinary knowledge-driven scientific insight. Here we propose a data-intelligence-intensive bioinformatics copilot (Bio-Copilot) system that synergizes AI capabilities with human expertise to facilitate hypothesis-free exploratory research and inspire novel scientific insights in large-scale omics studies. Bio-Copilot forms high-quality intensive intelligence through close collaboration between multiple agents, driven by large language models (LLMs), and human experts. To augment the capabilities of Bio-Copilot, this study devises an agent group management strategy, an effective human-agent interaction mechanism, a shared interdisciplinary knowledge database, and continuous learning strategies for the agents. We comprehensively compare Bio-Copilot against GPT-4o and several leading AI agents across diverse bioinformatics tasks, using a broad range of evaluation metrics. Bio-Copilot achieves the overall state-of-the-art performance across all tasks, while showcases exceptional task completeness. Furthermore, in the application of constructing a large-scale human lung cell atlas, Bio-Copilot not only reproduces the intricate data integration process detailed in a seminal study but also introduces a hierarchical annotation strategy to capture the continuous nature of cellular states and uncovers the characteristics of rare cell types, highlighting its potential to unravel hidden complexities in biological systems. Beyond the technical achievements, this study also underscores the profound implications of integrating AI capabilities with expert knowledge in accelerating impactful biological discoveries and exploring uncharted territories in life sciences.
Publisher
Cold Spring Harbor Laboratory