DeepKin: precise estimation of in-depth relatedness and its application in UK Biobank-Reference-Cited by-同舟云学术

DeepKin: precise estimation of in-depth relatedness and its application in UK Biobank

Published:2024-05-01 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Zhang Qi-Xin,Jayasinghe Dovini,Lee Sang Hong^ORCID,Xu Hai-Ming,Chen Guo-Bo^ORCID

Abstract

AbstractAccurately estimating relatedness between samples is crucial in genetics and epidemiological analysis. Using genome-wide single nucleotide polymorphisms (SNPs), it is now feasible to measure realized relatedness even in the absence of pedigree. However, the sampling variation in SNP-based measures and factors affecting method-of-moments relatedness estimators have not been fully explored, whilst static cut-off thresholds have traditionally been employed to classify relatedness levels for decades. Here, we introduce the deepKin framework as a moment-based relatedness estimation and inference method that incorporates data-specific cut-off threshold determination. It addresses the limitations of previous moment estimators by leveraging the sampling variance of the estimator to provide statistical inference and classification. Key principles in relatedness estimation and inference are provided, including inferring the critical value required to reject the hypothesis of unrelatedness, which we refer to as the deepest significant relatedness, determining the minimum effective number of markers, and understanding the impact on statistical power. Through simulations, we demonstrate that deepKin accurately infers both unrelated pairs and relatives with the support of sampling variance. We then apply deepKin to two subsets of the UK Biobank dataset. In the 3K Oxford subset, tested with four sets of SNPs, the SNP set with the largest effective number of markers and correspondingly the smallest expected sampling variance exhibits the most powerful inference for distant relatives. In the 430K British White subset, deepKin identifies 212,120 pairs of significant relatives and classifies them into six degrees. Additionally, cross-cohort significant relative ratios among 19 assessment centers located in different cities are geographically correlated, while within-cohort analyses indicate both an increase in close relatedness and a potential increase in diversity from north to south throughout the UK. Overall, deepKin presents a novel framework for accurate relatedness estimation and inference in biobank-scale datasets. For biobank-scale application we have implemented deepKin as an R package, available in the GitHub repository (https://github.com/qixininin/deepKin).

Publisher

Cold Spring Harbor Laboratory

Reference19 articles.

1. The genomic history of the Middle East

2. The UK Biobank resource with deep phenotyping and genomic data

3. EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations

4. Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman-Elston regression;Frontiers in Genetics,2014

5. Case-control association testing in the presence of unknown relationships