A self-training subspace clustering algorithm based on adaptive confidence for gene expression data-Reference-Cited by-同舟云学术

A self-training subspace clustering algorithm based on adaptive confidence for gene expression data

Published:2023-03-21 Issue: Volume:14 Page:
ISSN:1664-8021
Container-title:Frontiers in Genetics
language:
Short-container-title:Front. Genet.

Author:

Li Dan,Liang Hongnan,Qin Pan,Wang Jia

Abstract

Gene clustering is one of the important techniques to identify co-expressed gene groups from gene expression data, which provides a powerful tool for investigating functional relationships of genes in biological process. Self-training is a kind of important semi-supervised learning method and has exhibited good performance on gene clustering problem. However, the self-training process inevitably suffers from mislabeling, the accumulation of which will lead to the degradation of semi-supervised learning performance of gene expression data. To solve the problem, this paper proposes a self-training subspace clustering algorithm based on adaptive confidence for gene expression data (SSCAC), which combines the low-rank representation of gene expression data and adaptive adjustment of label confidence to better guide the partition of unlabeled data. The superiority of the proposed SSCAC algorithm is mainly reflected in the following aspects. 1) In order to improve the discriminative property of gene expression data, the low-rank representation with distance penalty is used to mine the potential subspace structure of data. 2) Considering the problem of mislabeling in self-training, a semi-supervised clustering objective function with label confidence is proposed, and a self-training subspace clustering framework is constructed on this basis. 3) In order to mitigate the negative impact of mislabeled data, an adaptive adjustment strategy based on gravitational search algorithm is proposed for label confidence. Compared with a variety of state-of-the-art unsupervised and semi-supervised learning algorithms, the SSCAC algorithm has demonstrated its superiority through extensive experiments on two benchmark gene expression datasets.

Publisher

Frontiers Media SA

Subject

Genetics (clinical),Genetics,Molecular Medicine

Reference41 articles.

1. Lambertian reflectance and linear subspaces;Basri;IEEE Trans. Pattern Analysis Mach. Intell.,2003

2. Semi-Supervised Learning

3. Multi-class image classification based on active learning and semi-supervised learning;Chen;Acta Autom. Sin.,2011

4. Weighted co-expression network analysis identifies rnf181 as a causal gene of coronary artery disease;Dang;Front. Genet.,2022

5. Detection of co-expressed pathway modules associated with mineral concentration and meat quality in nelore cattle;Diniz;Front. Genet.,2019