SpiderLearner: An ensemble approach to Gaussian graphical model estimation

Author:

Shutta Katherine H.123ORCID,Balzer Laura B.4,Scholtens Denise M.5,Balasubramanian Raji1ORCID

Affiliation:

1. Department of Biostatistics and Epidemiology University of Massachusetts–Amherst Amherst Massachusetts USA

2. Department of Biostatistics Harvard School of Public Health Boston Massachusetts USA

3. Channing Division of Network Medicine, Department of Medicine Brigham and Women's Hospital, Harvard Medical School Boston Massachusetts USA

4. Division of Biostatistics University of California–Berkeley Berkeley California USA

5. Division of Biostatistics, Department of Preventive Medicine Northwestern University Feinberg School of Medicine Chicago Illinois USA

Abstract

AbstractGaussian graphical models (GGMs) are a popular form of network model in which nodes represent features in multivariate normal data and edges reflect conditional dependencies between these features. GGM estimation is an active area of research. Currently available tools for GGM estimation require investigators to make several choices regarding algorithms, scoring criteria, and tuning parameters. An estimated GGM may be highly sensitive to these choices, and the accuracy of each method can vary based on structural characteristics of the network such as topology, degree distribution, and density. Because these characteristics area prioriunknown, it is not straightforward to establish universal guidelines for choosing a GGM estimation method. We address this problem by introducing SpiderLearner, an ensemble method that constructs a consensus network from multiple estimated GGMs. Given a set of candidate methods, SpiderLearner estimates the optimal convex combination of results from each method using a likelihood‐based loss function. ‐fold cross‐validation is applied in this process, reducing the risk of overfitting. In simulations, SpiderLearner performs better than or comparably to the best candidate methods according to a variety of metrics, including relative Frobenius norm and out‐of‐sample likelihood. We apply SpiderLearner to publicly available ovarian cancer gene expression data including 2013 participants from 13 diverse studies, demonstrating our tool's potential to identify biomarkers of complex disease. SpiderLearner is implemented as flexible, extensible, open‐source code in the R packageensembleGGMathttps://github.com/katehoffshutta/ensembleGGM.

Funder

U.S. National Library of Medicine

Publisher

Wiley

Subject

Statistics and Probability,Epidemiology

Reference71 articles.

1. Graphical Models

2. Sparse inverse covariance estimation with the graphical lasso

3. Model selection and estimation in the Gaussian graphical model

4. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data;Banerjee O;J Mach Learn Res,2008

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3