Text mining for modeling of protein complexes enhanced by machine learning-Reference-Cited by-同舟云学术

Text mining for modeling of protein complexes enhanced by machine learning

Published:2020-09-22 Issue:4 Volume:37 Page:497-505
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Badal Varsha D¹,Kundrotas Petras J¹,Vakser Ilya A¹²

Affiliation:

1. Computational Biology Program

2. Department of Molecular Biosciences, The University of Kansas, Lawrence, KS 66045, USA

Abstract

Abstract Motivation Procedures for structural modeling of protein–protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein–protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availabilityand implementation The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

NIH

NSF

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa823/33780178/btaa823.pdf

Reference65 articles.

1. Text mining for protein docking;Badal;PLoS Comput. Biol,2015

2. Natural language processing in text mining for structural modeling of protein complexes;Badal;BMC Bioinformatics,2018

3. Representation learning: a review and new perspectives;Bengio;IEEE Trans. Patt. Anal. Mach. Intell,2013

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Pan-Cancer Analysis of PGAM1 and Its Experimental Validation in Uveal Melanoma Progression;Journal of Cancer;2024

2. Integrative Analysis of the Role of TP53 in Human Pan-Cancer;Current Issues in Molecular Biology;2023-11-29

3. Artifical intelligence: a virtual chemist for natural product drug discovery;Journal of Biomolecular Structure and Dynamics;2023-05-26

4. Natural product drug discovery in the artificial intelligence era;Chemical Science;2022