HighDimMixedModels.jl: Robust High Dimensional Mixed Models across Omics Data

Author:

Gorstein Evan,Aghdam Rosa,Solís-Lemus Claudia

Abstract

AbstractHigh dimensional mixed-effect models are an increasingly important form of regression in modern biology, in which the number of variables often matches or exceeds the number of samples, which are collected in groups or clusters. The penalized likelihood approach to fitting these models relies on a coordinate gradient descent (CGD) algorithm that lacks guarantees of convergence to a global optimum. Here, we study empirically the behavior of the algorithm across a number of common study types in modern omics datatypes. In particular, we study the empirical performance of high dimensional mixed-effect models fit to data simulated to mimic the features of transcriptome, genome-wide association, and microbiome data. In addition, we study the performance of the model on real data from each of these study types. To facilitate these simulations, we implement the algorithm in an open source Julia packageHighDimMixedModels.jl. We compare the performance of two commonly used penalties, namely LASSO and SCAD, within theHighDimMixedModels.jlframework. Our results demonstrate that the SCAD penalty consistently outperforms LASSO in terms of both variable selection and estimation accuracy across omics data. Through our comprehensive analysis, we illuminate the intricate relationship between algorithmic behavior, penalty selection, and dataset properties such as the correlation structure among features, providing valuable insights for researchers employing high dimensional mixed-effect models in biological investigations.Author SummaryHigh dimensional mixed-effect models are increasingly indispensable in modern biology, particularly in omics studies, where the number of variables often equals or surpasses the number of samples, and data are collected in clusters or groups. In our research, we concentrate on the penalized likelihood approach to fitting these models, employing a coordinate gradient descent (CGD) algorithm. While CGD is a widely used optimization technique, its convergence to a global optimum lack guarantees, prompting our empirical investigation into its behavior across various study types common in modern omics datasets. Our study provides insights into the performance of high dimensional mixed-effect models fitted to data simulating transcriptome, genome-wide association, and microbiome datasets. Additionally, we evaluate the model’s performance on real datasets from each of these study types. To facilitate reproducibility and further research, we have implemented the algorithm in an open-source Julia package,HighDimMixedModels.jl. Notably,HighDimMixedModels.jlstands out as the first package capable of seamlessly handling various omics datasets without errors, offering a user-friendly solution for researchers across disciplines. While numerous software packages are available for implementing high dimensional mixed-effects models on omics data, there is currently no comprehensive review source summarizing all methods. We provide a table summarizing existing methods, available in the Supplementary Material.

Publisher

Cold Spring Harbor Laboratory

Reference63 articles.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3