Affiliation:
1. College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
2. School of Artificial Intelligence,
Anhui University, Hefei, China
Abstract
Introduction:
The discovery of tumor subtypes helps to explore tumor pathogenesis, determine
the operability of clinical treatment, and improve patient survival. Clustering analysis is increasingly
applied to multi-genomic data. However, due to the diversity and complexity of multi-omics data,
developing a complete clustering algorithm for tumor molecular typing is still challenging.
Methods:
In this study, we present an adaptive density-aware spectral clustering method based on a variational
autoencoder (ADSVAE). ADSVAE first learns the underlying spatial information of each omics
data using a variational autoencoder (VAE) based on the Wasserstein distance metric. Secondly, a
similarity matrix is built for each gene set using an adaptive density-aware kernel. Thirdly, tensor product
graphs (TPGs) are used to merge different data sources and reduce noise. Finally, ADSVAE employs
a spectral clustering algorithm and utilizes the Gaussian mixture model (GMM) to cluster the final
eigenvector matrix to identify cancer subtypes.
Results:
We tested ADSVAE on 5 TCGA datasets, all with good performance in comparison with several
advanced multi-omics clustering algorithms. Compared with the existing multi-group clustering algorithms,
the variational autoencoder based on the Wasserstein distance measure in the ADSVAE algorithm
can learn the underlying spatial information on each omics data, which has a better effect on
learning complex data distribution. The self-tuning density-aware kernel used by the ADSVAE algorithm
enhances the similarity between shared near neighbor points, and the process of tensor product
plot data integration and diffusion can better reduce the noise and reveal the underlying structure, improving
the performance.
Conclusion:
Due to the inherent pitfalls of computational biology in the study of cancer subtype identification,
although some research conclusions have been made in this paper on the related issues, as the
research in related fields continues to deepen, the clustering study of cancer subtype identification based
on genomic data needs further improvement and refinement.
Funder
open fund of Information Materials and Intelligent Sensing Laboratory of Anhui Province
Xinjiang Autonomous Region University Research Program
National Natural Science Foundation of China
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry