Abstract
AbstractMotivationThe International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associations between gene loss-of-function and phenotype. To date, the IMPC has identified over 90,000 gene-phenotype associations, but many phenotypes have not yet been measured for each gene, resulting in largely incomplete data; about 75.6% of association summary statistics are still missing in the latest IMPC summary statistics dataset (IMPC release version 16).ResultsTo overcome these challenges, we propose KOMPUTE, a novel method for imputing missing summary statistics in the IMPC dataset. Using conditional distribution properties of multivariate normal, KOMPUTE estimates association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes. We evaluate the efficacy of the proposed method for recovering missing Z-scores using simulated and real-world data sets and compare it to a singular value decomposition (SVD) matrix completion method. Our results show that KOMPUTE outperforms the comparison method across different scenarios.Availability and implementationAn R package for KOMPUTE is publicly available athttps://github.com/statsleelab/kompute, along with usage examples and results for different phenotype domains athttps://statsleelab.github.io/komputeExamples.Contactleed13@miamioh.eduSupplementary informationSupplementary data are available atBioinformaticsonline.
Publisher
Cold Spring Harbor Laboratory