Author:
Zhang Qixin,Liu Tianzi,Guo Xinxin,Zhen Jianxin,Bao Kan,Yang Meng-yuan,Khederzadeh Saber,Zhou Fang,Han Xiaotong,Zheng Qiwen,Jia Peilin,Ding Xiaohu,He Mingguang,Zou Xin,Zhang Hongxin,He Ji,Zhu Xiaofeng,Zou Yangyun,Lu Sijia,Lu Daru,Chen Hongyan,Zeng Changqing,Liu Fan,Zheng Hou-Feng,Liu Siyang,Xu Hai-Ming,Chen Guo-Bo
Abstract
ABSTRACTIdentifying relatives across cohorts makes one of the basic routines for genomic data. As conventional such practice often requires explicit genomic data sharing, it is easily hampered by privacy or ethical constraints. In this study, using our proposed scheme for genomic encryption we developedencG-reg, a regression approach that is able to detect relatives of various degrees based on encrypted genomic data. The encryption properties ofencG-regis built on random matrix theory, which masks the original genotypic matrix but still provides controllable precision to that of direct individual-level genotype data. After having found tractable eighth-order moments for encrypted genotype, we established connection between the dimension of a random matrix and the required precision of a study.encG-regconsequently led to balanced i) false positive and false negative rates and ii) the computational cost and the degree of relatives to be searched. We validatedencG-regin 485,158 UKBiobank multi-ethnical samples, and the resolution ofencG-regwas comparable with the conventional method such as KING. In a more complex application, we launched a fine-devised multi-center collaboration across 6 research institutes in China, covering 11 cohorts of 64,091 GWAS samples. In both examples,encG-regrobustly identified and validated relatives existing across the cohorts even under various ethnical background and different genotypic qualities.
Publisher
Cold Spring Harbor Laboratory