A hybrid model combining evolutionary probability and machine learning leverages data-driven protein engineering-Reference-Cited by-同舟云学术

A hybrid model combining evolutionary probability and machine learning leverages data-driven protein engineering

Published:2022-06-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Illig Alexander-Maurice^ORCID,Siedhoff Niklas E.^ORCID,Schwaneberg Ulrich^ORCID,Davari Mehdi D.^ORCID

Abstract

AbstractProtein engineering through directed evolution and (semi-)rational approaches has been applied successfully to optimize protein properties for broad applications in molecular biology, biotechnology, and biomedicine. The potential of protein engineering is not yet fully realized due to the limited screening throughput hampering the efficient exploration of the vast protein sequence space. Data-driven strategies have emerged as a powerful tool to leverage protein engineering by providing a model of the sequence-fitness landscape that can exhaustively be explored in silico and capitalize on the high diversity potential offered by nature However, as both the quality and quantity of the inputted data determine the success of such approaches, the applicability of data-driven strategies is often limited due to sparse data. Here, we present a hybrid model that combines direct coupling analysis and machine learning techniques to enable data-driven protein engineering when only few labeled sequences are available. Our method achieves high performance in predicting a protein’s fitness based on its sequence regardless of the number of sequences-fitness pairs in the training dataset. Besides reducing the computational effort compared to state-of-the-art methods, it outperforms them for sparse data situations, i.e., 50 − 250 labeled sequences available for training. In essence, the developed method is auspicious for data-driven protein engineering, especially for protein engineers who have only access to a limited amount of data for sequence-fitness landscape modeling.

Publisher

Cold Spring Harbor Laboratory

Reference56 articles.

1. Beyond directed evolution—semi-rational protein engineering and design

2. Recombinant protein expression for therapeutic applications

3. Tripathi, N.K. , Shrivastava, A. : Chapter 4 - scale up of biopharmaceuticals production. In: Grumezescu, A.M. (ed.) Nanoscale Fabrication, Optimization, Scale-Up and Biological Aspects of Pharmaceutical Nanotechnology, pp. 133–172. William Andrew Publishing, Oxford, United Kingdom (2018)

4. Applications of Microbial Enzymes in Food Industry

5. Vasíc, K. , Knez, Z. , Leitgeb, M. : Bioethanol production by enzymatic hydrolysis from different lignocellulosic sources. Molecules 26(3) (2021)

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Biophysics-based protein language models for protein engineering;2024-03-17

2. Interpretable and explainable predictive machine learning models for data-driven protein engineering;2024-02-21

3. Addressing data scarcity in protein fitness landscape analysis: A study on semi-supervised and deep transfer learning techniques;Information Fusion;2024-02

4. An interpretable composite CNN and GRU for fine-grained martial arts motion modeling using big data analytics and machine learning;Soft Computing;2024-01-05

5. Machine Learning-Assisted Engineering of Light, Oxygen, Voltage Photoreceptor Adduct Lifetime;JACS Au;2023-11-21