Assessing and mitigating privacy risks of sparse, noisy genotypes by local alignment to haplotype databases-Reference-Cited by-同舟云学术

Assessing and mitigating privacy risks of sparse, noisy genotypes by local alignment to haplotype databases

Published:2023-12 Issue:12 Volume:33 Page:2156-2173
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Emani Prashant S.^ORCID,Geradi Maya N.,Gürsoy Gamze^ORCID,Grasty Monica R.,Miranker Andrew^ORCID,Gerstein Mark B.^ORCID

Abstract

Single nucleotide polymorphisms (SNPs) from omics data create a reidentification risk for individuals and their relatives. Although the ability of thousands of SNPs (especially rare ones) to identify individuals has been repeatedly shown, the availability of small sets of noisy genotypes, from environmental DNA samples or functional genomics data, motivated us to quantify their informativeness. We present a computational tool suite, termed Privacy Leakage by Inference across Genotypic HMM Trajectories (PLIGHT), using population-genetics-based hidden Markov models (HMMs) of recombination and mutation to find piecewise alignment of small, noisy SNP sets to reference haplotype databases. We explore cases in which query individuals are either known to be in the database, or not, and consider several genotype queries, including those from environmental sample swabs from known individuals and from simulated “mosaics” (two-individual composites). Using PLIGHT on a database with ∼5000 haplotypes, we find for common, noise-free SNPs that only ten are sufficient to identify individuals, ∼20 can identify both components in two-individual mosaics, and 20–30 can identify first-order relatives. Using noisy environmental-sample-derived SNPs, PLIGHT identifies individuals in a database using ∼30 SNPs. Even when the individuals are not in the database, local genotype matches allow for some phenotypic information leakage based on coarse-grained SNP imputation. Finally, by quantifying privacy leakage from sparse SNP sets, PLIGHT helps determine the value of selectively sanitizing released SNPs without explicit assumptions about population membership or allele frequency. To make this practical, we provide a sanitization tool to remove the most identifying SNPs from genomic data.

Funder

National Institutes of Health

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics (clinical),Genetics

Reference48 articles.

1. A global reference for human genetic variation

2. Fast and accurate inference of local ancestry in Latino populations

3. Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with application to gene mapping

4. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019

5. Next-generation genotype imputation service and methods

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From Sequence to Solution: Intelligent Learning Engine Optimization in Drug Discovery and Protein Analysis;BioTech;2024-09-01