Author:
Stilp Adrienne M,Emery Leslie S,Broome Jai G,Buth Erin J,Khan Alyna T,Laurie Cecelia A,Wang Fei Fei,Wong Quenna,Chen Dongquan,D’Augustine Catherine M,Heard-Costa Nancy L,Hohensee Chancellor R,Johnson William Craig,Juarez Lucia D,Liu Jingmin,Mutalik Karen M,Raffield Laura M,Wiggins Kerri L,de Vries Paul S,Kelly Tanika N,Kooperberg Charles,Natarajan Pradeep,Peloso Gina M,Peyser Patricia A,Reiner Alex P,Arnett Donna K,Aslibekyan Stella,Barnes Kathleen C,Bielak Lawrence F,Bis Joshua C,Cade Brian E,Chen Ming-Huei,Correa Adolfo,Cupples L Adrienne,de Andrade Mariza,Ellinor Patrick T,Fornage Myriam,Franceschini Nora,Gan Weiniu,Ganesh Santhi K,Graffelman Jan,Grove Megan L,Guo Xiuqing,Hawley Nicola L,Hsu Wan-Ling,Jackson Rebecca D,Jaquish Cashell E,Johnson Andrew D,Kardia Sharon L R,Kelly Shannon,Lee Jiwon,Mathias Rasika A,McGarvey Stephen T,Mitchell Braxton D,Montasser May E,Morrison Alanna C,North Kari E,Nouraie Seyed Mehdi,Oelsner Elizabeth C,Pankratz Nathan,Rich Stephen S,Rotter Jerome I,Smith Jennifer A,Taylor Kent D,Vasan Ramachandran S,Weeks Daniel E,Weiss Scott T,Wilson Carla G,Yanek Lisa R,Psaty Bruce M,Heckbert Susan R,Laurie Cathy C
Abstract
Abstract
Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.
Publisher
Oxford University Press (OUP)