Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring-Reference-Cited by-同舟云学术

Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring

Published:2021-12 Issue:1 Volume:11 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Zhou Yuan,Fa Botao,Wei Ting,Sun Jianle,Yu Zhangsheng,Zhang Yue

Abstract

AbstractInvestigation of the genetic basis of traits or clinical outcomes heavily relies on identifying relevant variables in molecular data. However, characteristics such as high dimensionality and complex correlation structures of these data hinder the development of related methods, resulting in the inclusion of false positives and negatives. We developed a variable importance measure method, termed the ECAR scores, that evaluates the importance of variables in the dataset. Based on this score, ranking and selection of variables can be achieved simultaneously. Unlike most current approaches, the ECAR scores aim to rank the influential variables as high as possible while maintaining the grouping property, instead of selecting the ones that are merely predictive. The ECAR scores’ performance is tested and compared to other methods on simulated, semi-synthetic, and real datasets. Results showed that the ECAR scores improve the CAR scores in terms of accuracy of variable selection and high-rank variables’ predictive power. It also outperforms other classic methods such as lasso and stability selection when there is a high degree of correlation among influential variables. As an application, we used the ECAR scores to analyze genes associated with forced expiratory volume in the first second in patients with lung cancer and reported six associated genes.

Funder

National Natural Science Foundation of China

Shanghai Phi-losophy and Social Sciences Planning Project

Fundamental Research Funds for the Central Universities

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-021-02706-0.pdf

Reference36 articles.

1. Dashtban, M. & Balafar, M. Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109, 91–107 (2017).

2. Cao, J., Zhang, L., Wang, B., Li, F. & Yang, J. A fast gene selection method for multi-cancer classification using multiple support vector data description. J. Biomed. Inform. 53, 381–389 (2015).

3. Degenhardt, F., Seifert, S. & Szymczak, S. Evaluation of variable selection methods for random forests and omics data sets. Brief. Bioinform. 20, 492–503 (2019).

4. Benner, C. et al. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

5. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).