Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder-Reference-Cited by-同舟云学术

Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder

Published:2023-01-10 Issue: Volume:12 Page:
ISSN:2234-943X
Container-title:Frontiers in Oncology
language:
Short-container-title:Front. Oncol.

Author:

Shen Junjie,Li Huijun,Yu Xinghao,Bai Lu,Dong Yongfei,Cao Jianping,Lu Ke,Tang Zaixiang

Abstract

Genomics involving tens of thousands of genes is a complex system determining phenotype. An interesting and vital issue is how to integrate highly sparse genetic genomics data with a mass of minor effects into a prediction model for improving prediction power. We find that the deep learning method can work well to extract features by transforming highly sparse dichotomous data to lower-dimensional continuous data in a non-linear way. This may provide benefits in risk prediction-associated genotype data. We developed a multi-stage strategy to extract information from highly sparse binary genotype data and applied it for cancer prognosis. Specifically, we first reduced the size of binary biomarkers via a univariable regression model to a moderate size. Then, a trainable auto-encoder was used to learn compact features from the reduced data. Next, we performed a LASSO problem process to select the optimal combination of extracted features. Lastly, we applied such feature combination to real cancer prognostic models and evaluated the raw predictive effect of the models. The results indicated that these compressed transformation features could better improve the model’s original predictive performance and might avoid an overfitting problem. This idea may be enlightening for everyone involved in cancer research, risk reduction, treatment, and patient care via integrating genomics data.

Publisher

Frontiers Media SA

Subject

Cancer Research,Oncology

Reference37 articles.

1. Integrative omics for health and disease;Karczewski;Nat Rev Genet,2018

2. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences;Manzoni;Briefings Bioinf,2018

3. Deep learning in cancer diagnosis, prognosis and treatment selection;Tran;Genome Med,2021

4. Safe feature elimination in sparse supervised learning;El Ghaoui;Pacific J Optimization.,2012

5. Regression shrinkage and selection via the lasso;Tibshirani;J R Stat Soc Ser B-Methodological,1996

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep Neural Network Integrated into Network-Based Stratification (D3NS): A Method to Uncover Cancer Subtypes from Somatic Mutations;Cancers;2024-08-14

2. MRI-based radiomics for preoperative prediction of recurrence and metastasis in rectal cancer;Abdominal Radiology;2024-02-26