Abstract
AbstractPredicting structure-dependent functionalities of biomolecules is crucial for accelerating a wide variety of applications in drug-screening, biosensing, disease-diagnosis, and therapy. Although the commonly used structural “fingerprints” work for biomolecules in traditional informatics implementations, they remain impractical in a wide range of machine learning approaches where the model is restricted to make data-driven decisions. Although peptides, proteins, and oligonucleotides have sequence-related propensities, representing them as sequences of letters, e.g., in bioinformatics studies, causes a loss of most of their structure-related functionalities. Biomolecules lacking sequence, such as polysaccharides, lipids, and their peptide conjugates, cannot be screened with models using the letter-based fingerprints. Here we introduce a new fingerprint derived from valence shell electron pair repulsion structures for small peptides that enables construction of structural feature-maps for a given biomolecule, regardless of the sequence or conformation. The feature-map introduced here uses a simple encoding derived from the molecular graph - atoms, bonds, distances, bond angles, etc., that make up each of the amino acids in the sequence, allowing a Residual Neural network model to take greater advantage of information in molecular structure. We make use of the short peptides binding to Major-Histocompatibility-Class-I protein alleles that are encoded in terms of their extended structures to predict allele-specific binding-affinities of test-peptides. Predictions are consistent, without appreciable loss in accuracy between models for different length sequences, marking an improvement over the current models. Biological processes are heterogeneous interactions, which justifies encoding all biomolecules universally in terms of structures and relating them to their functionality. The capabilities facilitated by the model expands the paradigm in establishing structure-function correlations among small molecules, short and longer sequences including large biomolecules, and genetic conjugates that may include polypeptides, polynucleotides, RNAs, lipids, peptidoglycans, peptido-lipids, and other biomolecules that could be implemented in a wide range of medical and nanobiotechnological applications in the future.
Publisher
Cold Spring Harbor Laboratory
Reference35 articles.
1. Extended-Connectivity Fingerprints
2. Automatic identification of molecular similarity using reduced-graph representation of chemical structure;Journal of Chemical Information and Modeling,1992
3. SMILES. 2. Algorithm for generation of unique SMILES notation;Journal of Chemical Information and Modeling,1989
4. Proceedings of the 1997 1st Electronic Packaging Technology Conference (Cat. No.97TH8307). Proceedings of the 1997 1st Electronic Packaging Technology Conference (Cat No 97TH8307) EPTC-97: IEEE; 1997.
5. InChI - the worldwide chemical structure identifier standard;Journal of Cheminformatics,2013
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献