Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study-Reference-Cited by-同舟云学术

Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study

Published:2021-05-18 Issue:2 Volume:26 Page:40
ISSN:2297-8747
Container-title:Mathematical and Computational Applications
language:en
Short-container-title:MCA

Author:

Daniels Michael W.,Dvorkin Daniel^ORCID,Powers Rani K.^ORCID,Kechris Katerina^ORCID

Abstract

Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.

Funder

National Institutes of Health

Publisher

MDPI AG

Subject

Applied Mathematics,Computational Mathematics,General Engineering

Link

https://www.mdpi.com/2297-8747/26/2/40/pdf

Reference67 articles.

1. The ENCODE (ENCyclopedia Of DNA Elements) Project

2. An integrated encyclopedia of DNA elements in the human genome

3. Unlocking the secrets of the genome

4. A Bayesian framework for combining heterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning: its challenges and opportunities in plant system biology;Applied Microbiology and Biotechnology;2022-05