Abstract
AbstractLarge-scale agnostic association analyses based on existing observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, particular challenges of non-probability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. In general, the extent of the bias that may be introduced by ignoring these factors is unknown. In this paper, we develop a statistical framework for characterizing the degree of bias expected in association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis type approach, this framework can be used to obtain plausible values for parameters of interest given results obtained from standard naive analysis methods under varying degree of misclassification and sampling biases. We develop an online tool for performing this sensitivity analysis in some special cases that occur frequently. Simulations demonstrate promising properties of the proposed way of characterizing biases. We apply our approach to study bias in genetic association studies using data from the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献