Author:
Bi Wenjian,Zhou Wei,Dey Rounak,Mukherjee Bhramar,Sampson Joshua N,Lee Seunggeun
Abstract
AbstractIn genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.
Publisher
Cold Spring Harbor Laboratory
Reference29 articles.
1. Beesley, L.J. et al. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Statistics in Medicine (2019).
2. Exploring and visualizing large-scale genetic associations by using PheWeb;Nature Genetics,2020
3. Biological and clinical insights from genetics of insomnia symptoms;Nature genetics,2019
4. Agresti, A. Categorical data analysis, (John Wiley & Sons, 2003).
5. The UK Biobank resource with deep phenotyping and genomic data