Optimizing InterProScan representation generates a surprisingly good protein function prediction method-Reference-Cited by-同舟云学术

Optimizing InterProScan representation generates a surprisingly good protein function prediction method

Published:2022-08-13 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Tiittanen Henri,Holm Liisa,Törönen Petri

Abstract

AbstractMotivationAutomated protein Function Prediction (AFP) is an intensively studied topic. Most of this research focuses on methods that combine multiple data sources, while fewer articles look for the most efficient ways to use a single data source. Therefore, we wanted to test how different preprocessing methods and classifiers would perform in the AFP task when we process the output from the InterProscan (IPS). Especially, we present novel preprocessing methods, less used classifiers and inclusion of species taxonomy. We also test classifier stacking for combining tested classifier results. Methods are tested with in-house data and CAFA3 competition evaluation data.ResultsWe show that including IPS localisation and taxonomy to the data improves results. Also the stacking improves the performance. Surprisingly, our best performing methods outperformed all international CAFA3 competition participants in most tests. Altogether, the results show how preprocessing and classifier combinations are beneficial in the AFP task.Contactpetri.toronen(AT)helsinki.fiSupplementary informationSupplementary text is available at the project web site http://ekhidna2.biocenter.helsinki.fi/AFP/ and at the end of this document.

Publisher

Cold Spring Harbor Laboratory

Reference47 articles.

1. Fast optimal leaf ordering for hierarchical clustering

2. Learning from positive and unlabeled data: A survey;Machine Learning,2020

3. Information-theoretic evaluation of predicted ontological annotations

4. Locating proteins in the cell using targetp, signalp and related tools;Nature protocols,2007

5. Automated protein function prediction--the genomic challenge