High‐Dimensional Overdispersed Generalized Factor Model With Application to Single‐Cell Sequencing Data Analysis

Author:

Nie Jinyu1ORCID,Qin Zhilong2,Liu Wei3ORCID

Affiliation:

1. Center of Statistical Research and School of Statistics Southwestern University of Finance and Economics Chengdu China

2. Institute of Western China Economic Research Southwestern University of Finance and Economics Chengdu China

3. School of Mathematics Sichuan University Chengdu China

Abstract

ABSTRACTThe current high‐dimensional linear factor models fail to account for the different types of variables, while high‐dimensional nonlinear factor models often overlook the overdispersion present in mixed‐type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high‐dimensional nonlinear factor analysis on overdispersed mixed‐type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high‐dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state‐of‐the‐art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.

Funder

National Natural Science Foundation of China

Fundamental Research Funds for the Central Universities

Publisher

Wiley

Reference34 articles.

1. Sufficient Forecasting Using Factor Models;Fan J.;Journal of Econometrics,2017

2. Generalized Factor Model for Ultra‐High Dimensional Correlated Variables With Mixed Types;Liu W.;Journal of the American Statistical Association,2023

3. On Factor Models With Random Missing: EM Estimation, Inference, and Cross Validation;Jin S.;Journal of Econometrics,2021

4. Common Risk Factors in the Returns on Stocks and Bonds;Fama E. F.;Journal of Financial Economics,1993

5. Joint Dimension Reduction and Clustering Analysis of Single‐Cell RNA‐Seq and Spatial Transcriptomics Data;Liu W.;Nucleic Acids Research,2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3