Affiliation:
1. School of Computer and Information Engineering, Henan University, Kaifeng, China
2. Henan Key Laboratory of Big
Data Analysis and Processing, Henan University, Kaifeng, China
Abstract
Background:
Hepatocellular carcinoma (HCC) is one of the malignancies with high mortality rate, and identify relevant biomarkers of HCC is helpful for early diagnosis and patient care. Though some high-dimensional omic data contains intrinsic biomedical information about HCC, how to integrate analysis them effectively and find promising biomarkers of HCC is still an important and difficult issue.
Methods:
We present a novel biomarker identification approach, named GEDNN, based on multi-omic
data and graph-embedded deep neural network. To achieve a more comprehensive understanding of
HCC, we first collected and normalized the three following types of HCC-related data: DNA methylation,
copy number variation (CNV), and gene expression. The ANOVA was adopted to filter out redundant
genes. Then, we measured the connectivity between gene pairs by Pearson correlation coefficient
of gene pairs, and further construct gene graph. Next, graph-embedded feedforward neural network
(DFN) and back-propagation of convolutional neural network (CNN) were combined to integratively
analyze the three types of omics data and achieve the importance score of gene biomarkers
Results:
Extensive experimental results showed that the biomarkers screened by the proposed method
were effective in classifying and predicting HCC. Furthermore, the gene analysis further showed that
the biomarkers screened by our method were strongly associated with the development of HCC.
Conclusion:
In this paper, we propose the GEDNN method to assess the importance of genes for more
accurate identification of cancer biomarkers, which facilitates the effective classification of cancers. The
proposed method is applied to multi-omics data of HCC, including RNASeq, DNAMeth and CNV, considering
the complementary information between different types of data. We construct a gene graph by
Pearson correlation coefficients as additional information for DFN, thus reducing the importance score
of redundant genes. In addition, the proposed method also incorporates back-propagation of CNN to
further obtain the importance of features.
Funder
National Natural Science Foundation of China
Science and Technology Development Plan Project of Henan Province
China Postdoctoral Science Foundation
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry