RaPID-Query for Fast Identity by Descent Search and Genealogical Analysis-Reference-Cited by-同舟云学术

RaPID-Query for Fast Identity by Descent Search and Genealogical Analysis

Published:2022-02-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Wei Yuan,Naseri Ardalan,Zhi Degui,Zhang Shaojie^ORCID

Abstract

AbstractThe size of genetic databases has grown large enough such that, genetic genealogical search, a process of inferring familial relatedness by identifying DNA matches, has become a viable approach to help individuals finding missing family members or law enforcement agencies locating suspects. However, a fast and accurate method is needed to search an out-of-database individual against millions of individuals in such databases. Most existing approaches only offer all-vs-all within panel match. Some prototype algorithms offer 1-vs-all query from out-of-panel individual, but they do not tolerate errors. A new method, random projection-based identical-by-descent (IBD) detection (RaPID) query, referred as RaPID-Query, is introduced to make fast genealogical search possible. RaPID-Query method identifies IBD segments between a query haplotype and a panel of haplotypes. By integrating matches over multiple PBWT indexes, RaPID-Query method is able to locate IBD segments quickly with a given cutoff length while allowing mismatched sites in IBD segments. A single query against all UK biobank autosomal chromosomes can be completed within 2.76 seconds CPU time on average, with the minimum 7 cM IBD segment length and minimum 700 markers. Using the same criteria, RaPID-Query can achieve 0.099 false negative rate and 0.017 false positive rate at the same time on a chromosome 20 sequencing panel having 92,296 sites, which is comparable to the state-of-the-art IBD detection method Hap-IBD. For the relatedness degree separation experiments, RaPID-Query is able to distinguish up to fourth degree of the familial relatedness for a given individual pair, and the area under the receiver operating characteristic curve values are at least 97.28%. It is anticipated that RaPID-Query will make genealogical search convenient and effective, potentially with the integration of complex inference models.

Publisher

Cold Spring Harbor Laboratory

Reference26 articles.

1. A community-maintained standard library of population genetic models

2. Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data

3. The UK Biobank resource with deep phenotyping and genomic data

4. A second generation human haplotype map of over 3.1 million SNPs

5. Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT)