Affiliation:
1. University of Nottingham, UK
Abstract
A key open problem, which has defied scientists for decades is the problem of predicting the 3D structure of proteins (Protein Structure Prediction - PSP) based on its primary sequence: the amino acids that compose a protein chain. Full atomistic molecular dynamics simulations are, for all intents and purposes, impractical as current empirical models may require massive computational resources. One of the possible ways of alleviating this cost and making the problem easier is to simplify the protein representation based on which the native 3D state is searched for. We have proposed a protocol based on evolutionary algorithms to perform this simplification of the protein representation. Our protocol does not use any domain knowledge. Instead it uses a well known information theory metric, Mutual Information, to generate a reduced representation that is able to maintain the crucial information needed for PSP. The evaluation process of our method has shown that it generates alphabets that have competent performance against the original, non-simplified, representation. Moreover, these reduced alphabets obtain better-than-human performance when compared to some classic reduced alphabets.
Funder
Engineering and Physical Sciences Research Council
Publisher
Association for Computing Machinery (ACM)
Reference27 articles.
1. Grand challenges 1993: High performance computing and communications 1992. The FY 1992 U.S. Research and Development Program Committee on Physical Mathematical and Engineering Sciences Federal Coordinating Council for Science Engineering and Technology Office of Science and Technology Policy. Grand challenges 1993: High performance computing and communications 1992. The FY 1992 U.S. Research and Development Program Committee on Physical Mathematical and Engineering Sciences Federal Coordinating Council for Science Engineering and Technology Office of Science and Technology Policy.
2. Fast rule representation for continuous attributes in genetics-based machine learning
3. Automated alphabet reduction method with evolutionary algorithms for protein structure prediction
4. Coordination number prediction using learning classifier systems