Author:
Ho Ho Yi Alexis,Xu Shuoxun,Guo Xinzhou
Abstract
AbstractHigh-throughput technologies in bioscience have pushed us into an era with high dimensionality. Swamped by thousands of predictors, detecting the valuable signal from the noise in clinical studies becomes challenging. As a common strategy, integrative analysis utilizing similarities across multiple studies might help lift the curse of dimensionality and enhance statistical power. However, due to the growing concern about individual data privacy, data-sharing constraints are often imposed in integrative analysis. These might lead to results inequivalent to ones without sharing constraints and reduce statistical power in integrative analyses. In this paper, built on Abess, we propose an integrative analysis method to estimate the site-specific parameters in the presence of high dimensional nuisance parameters in multi-site studies. Implemented with a carefully designed $$L_{2,0}$$
L
2
,
0
penalization on nuisance parameters, the proposed method satisfies both the DataSHIELD constraint, which only allows the transmission of summary statistics from sites, and the equivalence property that the solution is exactly the same as the solution merging all datasets into one on a single location. Assuming the nuisance parameters share a common support, the proposed method has support recovery and selection consistency with high probability and exhibits improved estimation accuracy on the site-specific parameters and low computational cost in numerical experiments. We demonstrate the merit of the proposed method by investigating the relationship between the CD8 T cell count and the treatment effect of zidovudine-incorporated therapy in the AIDS Clinical Trials Group Study 175.
Funder
Hong Kong University of Science and Technology
Publisher
Springer Science and Business Media LLC
Reference43 articles.
1. Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P (2019) Machine learning and integrative analysis of biomedical big data. Genes 10(2):87
2. Niu B, Yuan X-C, Roeper P, Su Q, Peng C-R, Yin J-Y, Ding J, Li H, Lu W-C (2013) Hiv-1 protease cleavage site prediction based on two-stage feature selection method. Protein Pept Lett 20(3):290–298
3. Kim G, Kim Y, Lim H, Kim H (2010) An mlp-based feature subset selection for hiv-1 protease cleavage site analysis. Artif Intell Med 48(2):83–89. https://doi.org/10.1016/j.artmed.2009.07.010
4. Liu H, Shi X, Guo D, Zhao Z, et al (2015) Feature selection combined with neural network structure optimization for hiv-1 protease cleavage site prediction. BioMed Res Int
5. Liu M, Xia Y, Cho K, Cai T (2021) Integrative high dimensional multiple testing with heterogeneity under data sharing constraints. J Mach Learn Res 22(1):5607–5632