EnZymClass: Substrate specificity prediction tool of plant acyl-ACP thioesterases based on Ensemble Learning

Author:

Banerjee DeeproORCID,Jindra Michael A.,Linot Alec J.ORCID,Pfleger Brian F.ORCID,Maranas Costas D.ORCID

Abstract

AbstractClassification of proteins into their respective functional categories remains a long-standing key challenge in computational biology. Machine Learning (ML) based discriminative algorithms have been used extensively to address this challenge; however, the presence of small-sized, noisy, unbalanced protein classification datasets where high sequence similarity does not always imply identical functional properties have prevented robust prediction performance. Herein we present a ML method, Ensemble method for enZyme Classification (EnZymClass), that is specifically designed to address these issues. EnZymClass makes use of 47 alignment-free feature extraction techniques as numerically encoded descriptors of protein sequences to construct a stacked ensemble classification scheme capable of categorizing proteins based on their functional attributes. We used EnZymClass to classify plant acyl-ACP thioesterases (TEs) into short, long and mixed free fatty acid substrate specificity categories. While general guidelines for inferring substrate specificity have been proposed before, prediction of chain-length preference from primary sequence has remained elusive. EnZymClass achieved high classification metric scores on the TE substrate specificity prediction task (average accuracy score of 0.8, average precision and recall scores of 0.87 and 0.89 respectively on medium-chain TE prediction) producing accuracy scores that are about twice as effective at avoiding misclassifications than existing similarity-based methods of substrate specificity prediction. By applying EnZymClass to a subset of TEs in the ThYme database, we identified two acyl-ACP TE, ClFatB3 and CwFatB2, with previously uncharacterized activity in E. coli fatty acid production hosts. We incorporated modifications into ClFatB3 established in prior TE engineering studies, resulting in a 4.2-fold overall improvement in observed C10 titers over the wildtype enzyme.EnZymClass can be readily applied to other protein classification challenges and is available at: https://github.com/deeprob/ThioesteraseEnzymeSpecificityAuthor SummaryThe natural diversity of proteins has been harnessed to serve specialized applications in various fields, including medicine, renewable chemical production, and food and agriculture. Acquiring and characterizing new proteins to meet a given application, however, can be an expensive process, requiring selection from thousands to hundreds of thousands of candidates in a database and subsequent experimental screening. Using amino acid sequence to predict a protein’s function has been demonstrated to accelerate this process, however standard approaches require information on previously characterized proteins and their respective sequences. Obtaining the necessary amount of data to accurately infer sequence-function relationships can be prohibitive, especially with a low-throughput testing cycle. Here, we present EnZymClass, a model that is specifically designed to work with small to medium-sized protein sequence datasets and retain high prediction performance of function. We applied EnZymClass to predict the presence or absence of a desired function among acyl-ACP thioesterases, a key enzyme class used in the production of renewable oleochemicals in microbial hosts. By training EnZymClass on only 115 functionally characterized enzyme sequences, we were able to successfully detect two plant acyl-ACP thioesterases with the desired specialized function among 617 sequences in the ThYme database.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3