Novel Privacy Considerations for Large Scale Proteomics

Author:

Hill Andrew C.1,Litkowski Elizabeth M.2,Manichaikul Ani3,Yu Bing4,Gorbet Betty A.4,Lange Leslie5,Pratte Katherine A.1,Kechris Katerina J.2,DeCamp Matthew5,Coors Marilyn5,Ortega Victor E.6,Rich Stephen S.3,Rotter Jerome I.7,Gerzsten Robert E.8,Clish Clary B.9,Curtis Jeffrey10,Hu Xiaowei3,Ngo Debby11,O'Neal Wanda K.12,Meyers Deborah13,Bleecker Eugene13,Hobbs Brian D.14,Cho Michael H.15,Banaei-Kashani Farnoush16,Guo Claire1,Bowler Russell1

Affiliation:

1. National Jewish Health

2. Colorado School of Public Health

3. Center for Public Health Genomics, University of Virginia

4. Department of Epidemiology and Human Genetics Center UTHealth School of Public Health

5. University of Colorado

6. Mayo Clinic

7. The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center

8. Division of Cardiovascular Medicine, and Cardiovascular Research Center, Beth Israel Deaconess Medical Center

9. Metabolomics Platform, Broad Institute of Massachusetts Institute of Technology and Harvard

10. University of Michigan

11. Novartis (United States)

12. University of North Carolina at Chapel Hill

13. University of Arizona

14. Channing Division of Network Medicine, Brigham and Women’s Hospital

15. Brigham and Women's Hospital

16. University of Colorado Denver

Abstract

Abstract Privacy protection is a core principle of genomic but not proteomic research. We identified independent single nucleotide polymorphism (SNP) quantitative trait loci (pQTL) from COPDGene and Jackson Heart Study (JHS), calculated continuous protein level genotype probabilities, and then applied a naïve Bayesian approach to match proteomes to genomes for 2,812 independent subjects from COPDGene, JHS, SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS) and Multi-Ethnic Study of Atherosclerosis (MESA). We were able to correctly match 90%-95% of proteomes to their correct genome and for 95%-99% we could match the proteome to the 1% most likely genome. The accuracy of matching in subjects with African ancestry was lower (~ 60%) unless training included diverse subjects. With larger profiling (SomaScan 5K) in the Atherosclerosis Risk Communities (ARIC) correct identification was > 99% even in mixed ancestry populations. When serial proteomes are available, the matching algorithm can be used to identify and correct mislabeled samples. This work also demonstrates the importance of including diverse populations in omics research and that large proteomic datasets (> 1,000 proteins) can be accurately linked to a specific genome through pQTL knowledge and should not be considered unidentifiable.

Publisher

Research Square Platform LLC

Reference28 articles.

1. Individual-specific 'fingerprints' of human DNA;Jeffreys AJ;Nature,1985

2. Initial sequencing and analysis of the human genome;Lander ES;Nature,2001

3. The sequence of the human genome;Venter JC;Science,2001

4. Sweeney, L., A. Abu, and J. Winn, Identifying Participants in the Personal Genome Project by Name CoRR, 2013.

5. Lessons from HeLa Cells: The Ethics and Policy of Biospecimens;Beskow LM;Annu Rev Genomics Hum Genet,2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3