Abstract
1.AbstractThe datasets of large genotyping biobanks and direct-to-consumer genetic testing companies contain many related individuals. Until now, it has been widely accepted that the most distant relationships that can be detected are around fifteen degrees (approximately 8thcousins) and that practical relationship estimates have a ceiling around ten degrees (approximately 5thcousins). However, we show that these assumptions are incorrect and that they are due to a misapplication of relationship estimators. In particular, relationship estimators are applied almost exclusively to putative relatives who have been identified because they share detectable tracts of DNA identically by descent (IBD). However, no existing relationship estimator conditions on the event that two individuals share at least one detectable segment of IBD anywhere in the genome. As a result, the relationship estimates obtained using existing estimators are dramatically biased for distant relationships, inferring all sufficiently distant relationships to be around ten degrees regardless of the depth of the true relationship. Existing relationship estimators are derived under a model that assumes that each pair of related individuals shares a single common ancestor (or mating pair of ancestors). This model breaks down for relationships beyond 10 generations in the past because individuals share many thousands of cryptic common ancestors due to pedigree collapse. We first derive a corrected likelihood that conditions on the event that at least one segment is observed between a pair of putative relatives and we demonstrate that the corrected likelihood largely eliminates the bias in estimates of pairwise relationships and provides a more accurate characterization of the uncertainty in these estimates. We then reformulate the relationship inference problem to account for the fact that individuals share many common ancestors, not just one. We demonstrate that the most distant relationship that can be inferred using IBD may be 200 degrees or more, rather than ten, extending the time-to-common ancestor from approximately 300 years in the past to approximately 3,000 years in the past or more. This dramatic increase in the range of relationship estimators makes it possible to infer relationships whose common ancestors lived before historical events such as European settlement of the Americas, the Transatlantic Slave Trade, and the rise and fall of the Roman Empire.
Publisher
Cold Spring Harbor Laboratory
Reference22 articles.
1. C.A. Ball , M.J. Barber , J. Byrnes , P. Carbonetto , K.G. Chahine , R.E. Curtis , J.M. Granka , E. Han , E.L. Hong , A.R. Kermany , N.M. Myres , K. Noto , J. Qi , K. Rand , Y. Wang , and L. Willmore . Rapid forward-in-time simulation at the chromosome and genome level. https://www.ancestry.com/dna/resource/whitePaper/AncestryDNA-Matching-White-Paper.pdf, 2016.
2. Crossover interference and sex-specific genetic maps shape identical by descent sharing in close relatives
3. A novel single-gamma approximation to the sum of independent gamma variables, and a generalization to infinitely divisible distributions
4. Addressing the feasibility of people of african descent finding living african relatives using direct-to-consumer genetic testing;American Journal of Biological Anthropology,2023
5. Supporting the use of genetic genealogy in restoring family narratives following the transatlantic slave trade;Am Anthropol,2024