Generative artificial intelligence GPT-4 accelerates knowledge mining and machine learning for synthetic biology-Reference-Cited by-同舟云学术

Generative artificial intelligence GPT-4 accelerates knowledge mining and machine learning for synthetic biology

Published:2023-06-14 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Xiao Zhengyang,Li Wenyu,Moon Hannah,Roell Garrett W.,Chen Yixin,Tang Yinjie J.

Abstract

AbstractKnowledge mining from synthetic biology journal articles for machine learning (ML) applications is a labor-intensive process. The development of natural language processing (NLP) tools, such as GPT-4, can accelerate the extraction of published information related to microbial performance under complex strain engineering and bioreactor conditions. As a proof of concept, we used GPT-4 to extract knowledge from 176 publications on two oleaginous yeasts (Yarrowia lipolyticaandRhodosporidium toruloides). After integration with a molecule inventory database, the outcome is a total of 2037 data instances and 28 features, which serve as machine learning inputs. The structured datasets enabled ML approaches (e.g., a random forest model) to predict Yarrowia fermentation titers with high accuracy (R2of 0.86 for unseen test data). Via transfer learning, the trained model could also assess the production capability of the non-conventional yeast,R. toruloides, for which there are fewer published reports. This work demonstrated the potential of generative artificial intelligence to speed up information extraction from research articles, thereby improving design-build-test-learn (DBTL) cycles for commercial biomanufacturing development.

Publisher

Cold Spring Harbor Laboratory

Reference43 articles.

1. Artificial intelligence: a solution to involution of design–build–test–learn cycle

2. Integrated knowledge mining, genome-scale modeling, and machine learning for predicting Yarrowia lipolytica bioproduction;Metabolic Engineering,2021

3. Evaluating Factors That Influence Microbial Synthesis Yields by Linear Regression with Numerical and Ordinal Variables

4. Statistics-based model for prediction of chemical biosynthesis yield from Saccharomyces cerevisiae

5. Machine learning-informed and synthetic biology-enabled semi-continuous algal cultivation to unleash renewable fuel productivity;Nature Communications,2022

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integration of genetic engineering and multi-factor fermentation optimization for co-production of carotenoid and DHA in Schizochytrium sp;Bioresource Technology;2024-02

2. GPT-Assisted Learning of Structure–Property Relationships by Graph Neural Networks: Application to Rare-Earth-Doped Phosphors;The Journal of Physical Chemistry Letters;2023-12-08

3. Analyzing the Future of ChatGPT in Medical Research;Artificial Intelligence Applications Using ChatGPT in Education;2023-09-15