Author:
Wan Shibiao,Wang Jieqiong
Abstract
With the technological advances in recent decades, determining whole genome sequencing of a person has become feasible and affordable. As a result, large-scale individual genomic sequences are produced and collected for genetic medical diagnoses and cancer drug discovery, which, however, simultaneously poses serious challenges to the protection of personal genomic privacy. It is highly urgent to develop methods which make the personal genomic data both utilizable and confidential. Existing genomic privacy-protection methods are either time-consuming for encryption or with low accuracy of data recovery. To tackle these problems, this paper proposes a sequence similarity-based obfuscation method, namely IterMegaBLAST, for fast and reliable protection of personal genomic privacy. Specifically, given a randomly selected sequence from a dataset of genomic sequences, we first use MegaBLAST to find its most similar sequence from the dataset. These two aligned sequences form a cluster, for which an obfuscated sequence was generated via a DNA generalization lattice scheme. These procedures are iteratively performed until all of the sequences in the dataset are clustered and their obfuscated sequences are generated. Experimental results on benchmark datasets demonstrate that under the same degree of anonymity, IterMegaBLAST significantly outperforms existing state-of-the-art approaches in terms of both utility accuracy and time complexity.
Funder
National Cancer Institute
Subject
Genetics (clinical),Genetics,Molecular Medicine
Reference29 articles.
1. Ratio Utility and Cost Analysis for Privacy Preserving Subspace Projection;Al,2017
2. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs;Altschul;Nucleic Acids Res.,1997
3. Genoppml–a Framework for Genomic Privacy-Preserving Machine Learning;Carpov;Cryptology ePrint Archive,2021
4. Differential Privacy protection against Membership Inference Attack on Machine Learning for Genomic Data;Chen,2020
5. Genomic Medicine, Health Information Technology, and Patient Care;Chute;JAMA,2013