Abstract
AbstractThe size of genetic databases has grown large enough such that, genetic genealogical search, a process of inferring familial relatedness by identifying DNA matches, has become a viable approach to help individuals finding missing family members or law enforcement agencies locating suspects. However, a fast and accurate method is needed to search an out-of-database individual against millions of individuals in such databases. Most existing approaches only offer all-vs-all within panel match. Some prototype algorithms offer 1-vs-all query from out-of-panel individual, but they do not tolerate errors. A new method, random projection-based identical-by-descent (IBD) detection (RaPID) query, referred as RaPID-Query, is introduced to make fast genealogical search possible. RaPID-Query method identifies IBD segments between a query haplotype and a panel of haplotypes. By integrating matches over multiple PBWT indexes, RaPID-Query method is able to locate IBD segments quickly with a given cutoff length while allowing mismatched sites in IBD segments. A single query against all UK biobank autosomal chromosomes can be completed within 2.76 seconds CPU time on average, with the minimum 7 cM IBD segment length and minimum 700 markers. Using the same criteria, RaPID-Query can achieve 0.099 false negative rate and 0.017 false positive rate at the same time on a chromosome 20 sequencing panel having 92,296 sites, which is comparable to the state-of-the-art IBD detection method Hap-IBD. For the relatedness degree separation experiments, RaPID-Query is able to distinguish up to fourth degree of the familial relatedness for a given individual pair, and the area under the receiver operating characteristic curve values are at least 97.28%. It is anticipated that RaPID-Query will make genealogical search convenient and effective, potentially with the integration of complex inference models.
Publisher
Cold Spring Harbor Laboratory