Affiliation:
1. Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais , Belo Horizonte, Minas Gerais, Brazil
Abstract
Abstract
Motivation
Insects possess a vast phenotypic diversity and key ecological roles. Several insect species also have medical, agricultural and veterinary importance as parasites and disease vectors. Therefore, strategies to identify potential essential genes in insects may reduce the resources needed to find molecular players in central processes of insect biology. However, most predictors of essential genes in multicellular eukaryotes using machine learning rely on expensive and laborious experimental data to be used as gene features, such as gene expression profiles or protein–protein interactions, even though some of this information may not be available for the majority of insect species with genomic sequences available.
Results
Here, we present and validate a machine learning strategy to predict essential genes in insects using sequence-based intrinsic attributes (statistical and physicochemical data) together with the predictions of subcellular location and transcriptomic data, if available. We gathered information available in public databases describing essential and non-essential genes for Drosophila melanogaster (fruit fly, Diptera) and Tribolium castaneum (red flour beetle, Coleoptera). We proceeded by computing intrinsic and extrinsic attributes that were used to train statistical models in one species and tested by their capability of predicting essential genes in the other. Even models trained using only intrinsic attributes are capable of predicting genes in the other insect species, including the prediction of lineage-specific essential genes. Furthermore, the inclusion of RNA-Seq data is a major factor to increase classifier performance.
Availability and implementation
The code, data and final models produced in this study are freely available at https://github.com/g1o/GeneEssentiality/.
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
Graduate Programs of Genetics
Bioinformatics (PPG-Bioinfo) of Universidade Federal de Minas Gerais
CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil) – Finance
Pró-Reitoria de Pesquisa-UFMG
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献