A penalized integrative deep neural network for variable selection among multiple omics datasets-Reference-Cited by-同舟云学术

A penalized integrative deep neural network for variable selection among multiple omics datasets

Published:2024-06-07 Issue:3 Volume:12 Page:313-323
ISSN:2095-4689
Container-title:Quantitative Biology
language:en
Short-container-title:Quant. Biol.

Author:

Li Yang¹,Ren Xiaonan¹,Yu Haochen¹,Sun Tao¹,Ma Shuangge²

Affiliation:

1. Center for Applied Statistics School of Statistics Renmin University of China Beijing China

2. Department of Biostatistics Yale University New Haven Connecticut USA

Abstract

AbstractDeep learning has been increasingly popular in omics data analysis. Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability. However, because deep learning desires a large sample size, the existing methods may result in uncertain findings when the dataset has a small sample size, commonly seen in omics data analysis. With the explosion and availability of omics data from multiple populations/studies, the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets, which might lead to inaccurate variable selection results. We propose a penalized integrative deep neural network (PIN) to simultaneously select important variables from multiple datasets. PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework. Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets. The source code is freely available on Github (rucliyang/PINFunc). We speculate that the proposed PIN method will promote the identification of disease‐related important variables based on multiple studies/datasets from diverse origins.

Funder

National Natural Science Foundation of China

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/qub2.51

Reference40 articles.

1. Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies

2. A New Initiative on Precision Medicine

3. Deep learning in bioinformatics;Min S;Briefings Bioinf,2017

4. Deep learning for healthcare: review, opportunities and challenges

5. Biological interpretation of deep neural network for phenotype prediction based on gene expression