ADSVAE: An Adaptive Density-aware Spectral Clustering Method for Multi-omics Data Based on Variational Autoencoder-Reference-Cited by-同舟云学术

ADSVAE: An Adaptive Density-aware Spectral Clustering Method for Multi-omics Data Based on Variational Autoencoder

Published:2023-07 Issue:6 Volume:18 Page:527-536
ISSN:1574-8936
Container-title:Current Bioinformatics
language:en
Short-container-title:CBIO

Author:

Zhao Jianping¹,Guan Qi¹,Zheng Chunhou²,Cao Qingqing¹

Affiliation:

1. College of Mathematics and System Sciences, Xinjiang University, Urumqi, China

2. School of Artificial Intelligence, Anhui University, Hefei, China

Abstract

Introduction: The discovery of tumor subtypes helps to explore tumor pathogenesis, determine the operability of clinical treatment, and improve patient survival. Clustering analysis is increasingly applied to multi-genomic data. However, due to the diversity and complexity of multi-omics data, developing a complete clustering algorithm for tumor molecular typing is still challenging. Methods: In this study, we present an adaptive density-aware spectral clustering method based on a variational autoencoder (ADSVAE). ADSVAE first learns the underlying spatial information of each omics data using a variational autoencoder (VAE) based on the Wasserstein distance metric. Secondly, a similarity matrix is built for each gene set using an adaptive density-aware kernel. Thirdly, tensor product graphs (TPGs) are used to merge different data sources and reduce noise. Finally, ADSVAE employs a spectral clustering algorithm and utilizes the Gaussian mixture model (GMM) to cluster the final eigenvector matrix to identify cancer subtypes. Results: We tested ADSVAE on 5 TCGA datasets, all with good performance in comparison with several advanced multi-omics clustering algorithms. Compared with the existing multi-group clustering algorithms, the variational autoencoder based on the Wasserstein distance measure in the ADSVAE algorithm can learn the underlying spatial information on each omics data, which has a better effect on learning complex data distribution. The self-tuning density-aware kernel used by the ADSVAE algorithm enhances the similarity between shared near neighbor points, and the process of tensor product plot data integration and diffusion can better reduce the noise and reveal the underlying structure, improving the performance. Conclusion: Due to the inherent pitfalls of computational biology in the study of cancer subtype identification, although some research conclusions have been made in this paper on the related issues, as the research in related fields continues to deepen, the clustering study of cancer subtype identification based on genomic data needs further improvement and refinement.

Funder

open fund of Information Materials and Intelligent Sensing Laboratory of Anhui Province

Xinjiang Autonomous Region University Research Program

National Natural Science Foundation of China

Publisher

Bentham Science Publishers Ltd.

Subject

Computational Mathematics,Genetics,Molecular Biology,Biochemistry

Reference33 articles.

1. Goodwin S.; McPherson J.D.; McCombie W.R.; Coming of age: Ten years of next-generation sequencing technologies. Nat Rev Genet 2016,17(6),333-351

2. Li H.T.; Zhang J.; Xia J.; Zheng C.H.; Identification of driver pathways in cancer based on combinatorial patterns of somatic gene mutations. Neoplasma 2016,63(1),57-63

3. Wang B.; Mezlini A.M.; Demir F.; Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 2014,11(3),333-337

4. Wu D.; Wang D.; Zhang M.Q.; Gu J.; Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: Application to cancer molecular classification. BMC Genomics 2015,16(1),1022

5. Tini G.; Marchetti L.; Priami C.; Scott-Boyer M.P.; Multi-omics integration—a comparison of unsupervised clustering methodologies. Brief Bioinform 2019,20(4),1269-1279