Abstract
AbstractSemicontinuous data, characterized by an excess of zeros followed by a non-negative and right-skewed distribution, are frequently observed in biomedical research. Different statistical models have been proposed to investigate the association of covariates with such outcome. Motivated by the search of genetic factors associated with Neutrophil Extracellular Traps (NETs), a semicontinuous biomarker involved in thrombosis, we here investigated the impact of the selected model for semicontinuous traits in the context of a Genome Wide Association Study (GWAS). We compared three models that jointly model zero and positive values while allowing the estimation of a single association parameter of covariates with the global mean: Tobit, Negative Binomial and Compound Poisson-Gamma. We assessed the fit of these models to a sample of 657 participants of the FARIVE study measured for NETs plasma levels. For each of these three models, we performed a GWAS on NETs in FARIVE participants and results were compared. A simulation study was also conducted to evaluate the control of the type I error. Compound Poisson-Gamma and Negative Binomial models fitted NETs data observed in FARIVE better than the Tobit model. However, the Negative Binomial model suffered from an inflation of its type I error, attributable to extreme positive values of the NETs and low frequency variants. Conversely, the Compound Poisson-Gamma model was robust to both phenomena. Using the latter model, a GWAS identified a genome wide significant locus on chr21q21.3. The lead variant was rs57502213, a deletion of two nucleotides located ∼40kb upstream the non-coding RNA (miR155HG) hosting the miR-155 that was recently highlighted to have a role in NETs formation. This work indicates that the modeling strategy for a semicontinuous outcome in the framework of GWAS studies is crucial. The choice of the model should take into account the nature of the process generating zero values and the presence of extreme values. Our work also suggests that the Compound Poisson-Gamma model, while still marginally employed, can be a robust modeling strategy for GWAS analysis on a semicontinuous trait.
Publisher
Cold Spring Harbor Laboratory