Self-report inaccuracy in the UK Biobank: Impact on inference and interplay with selective participation-Reference-Cited by-同舟云学术

Self-report inaccuracy in the UK Biobank: Impact on inference and interplay with selective participation

Published:2023-10-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Schoeler Tabea,Pingault Jean-Baptiste,Kutalik Zoltán

Abstract

While the use of short self-report measures is common practice in biobank initiatives, such phenotyping strategy is inherently prone to reporting errors. In this work, we aimed to explore challenges related to self-report errors for biobank-scale research.We derived a reporting error score (RESUM) for n=73,129 UK Biobank (UKBB) participants, capturing inconsistent self-reporting in time-invariant phenotypes across multiple measurement occasions. We then performed genome-wide association scans on RESUM, applied downstream analyses (LD Score Regression and Mendelian Randomization, MR), and compared its properties to a previously studied participation behaviour (UKBB participation propensity). The results were then used in extended analyses (simulations, inverse probability and variance weighting) to explore patterns and propose possible corrections for biases induced by reporting error and/or selective participation. Finally, to assess the impact of reporting error on SNP effects and trait heritability, we improved phenotype resolution for 15 self-report measures and inspected the changes in genomic findings.Reporting error was present in the UKBB across all 33 assessed, time-invariant, measures, with repeatability levels as low as 11% (e.g., inconsistent recall of childhood sunburns). We found that reporting error was not independent from UKBB participation, evidenced by their negative genetic correlation (rg= -0.90), their shared causes (e.g., education, income, intelligence; assessed in MR) and the loss in self-report accuracy following participation bias correction. Depending on where reporting error occurred in the analytical pipeline, its impact ranged from reduced power (e.g., for gene-discovery) to biased effect estimates (e.g., if present in the exposure variable) and attenuation of genome-wide quantities (e.g., 20% relativeh2-attenuation for self-reported childhood height).Our findings highlight that both self-report accuracy and selective participation are competing biases and sources of poor reproducibility for biobank-scale research. Implementation of approaches that aim to enhance phenotype resolution while ensuring sample representativeness are therefore essential when working with biobank data.

Publisher

Cold Spring Harbor Laboratory

Reference32 articles.

1. Phenotypic Complexity, Measurement Bias, and Poor Phenotypic Resolution Contribute to the Missing Heritability Problem in Genetic Association Studies

2. Dissecting polygenic signals from genome-wide association studies on human behaviour

3. Precision behavioral phenotyping as a strategy for uncovering the biological correlates of psychopathology

4. Corruption of the Pearson correlation coefficient by measurement error and its estimation, bias, and correction under different error models

5. An atlas of genetic correlations across human diseases and traits

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Breaking down causes, consequences, and mediating effects of age-related telomere shortening on human health;2024-01-13