Affiliation:
1. School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
2. Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116, China
3. Department of Information Center, Weihai Ocean Vocational College, Rongcheng 264300, China
Abstract
It is a crucial task to design an integrated method to discover cancer subtypes and understand the heterogeneity of cancer based on multiple genomic data. In recent years, some clustering algorithms have been proposed and applied to cancer subtype prediction. Among them, similarity network fusion (SNF) can integrate multiple types of genomic data to identify cancer subtypes, which improves the understanding of tumorigenesis. SNF uses a dense similarity matrix to obtain the global information of the data, and the interconnection of samples between different categories will cause noise interference. Therefore, how to construct a more robust dense similarity matrix is an important research content to improve the performance of cancer subtype identification. In this paper, we proposed similarity network fusion based on random walk and relative entropy (R2SNF) for cancer subtype prediction. Firstly, the random walk algorithm was used to capture the complex relationship between samples in each genomic data. And the transition probability distribution of samples in the network was obtained. If two samples belong to the same class, the transition probability between the two samples is great. On the contrary, if the two samples do not belong to the same class, the transition probability between the two samples is small. In this way, the degree of correlation between samples can be well obtained, thereby reducing the noise interference caused by the interconnection of samples between different categories. Secondly, relative entropy was used to calculate the difference in the transition probability distribution between samples to construct a better dense similarity matrix which contains structural similarity information between samples. Thirdly, we iteratively fused the obtained dense similarity matrix with the KNN similarity matrix to construct the fused similarity matrix of all genomic data. Finally, by using spectral clustering, the fused similarity matrix was grouped into multiple clusters, which indicates the cancer subtypes. Experiments on seven cancer omics datasets show that the R2SNF algorithm performs well in identifying cancer subtypes.
Funder
National Natural Science Foundation of China
Subject
Computer Science Applications,Software
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献