The ENCODE Imputation Challenge: A critical assessment of methods for cross-cell type imputation of epigenomic profiles
Author:
Schreiber JacobORCID, Boix Carles, Lee Jin wook, Li Hongyang, Guan Yuanfang, Chang Chun-Chieh, Chang Jen-Chien, Hawkins-Hooker Alex, Schölkopf Bernhard, Schweikert Gabriele, Carulla Mateo Rojas, Canakoglu Arif, Guzzo Francesco, Nanni Luca, Masseroli Marco, Carman Mark James, Pinoli Pietro, Hong Chenyang, Yip Kevin Y., Spence Jeffrey P., Batra Sanjit Singh, Song Yun S., Mahony Shaun, Zhang Zheng, Tan Wuwei, Shen Yang, Sun Yuanfei, Shi Minyi, Adrian Jessika, Sandstrom Richard, Farrell Nina, Halow Jessica, Lee Kristen, Jiang Lixia, Yang Xinqiong, Epstein Charles, Strattan J. Seth, Snyder Michael, Kellis Manolis, Noble William Stafford, Kundaje Anshul,
Abstract
AbstractFunctional genomics experiments are invaluable for understanding mechanisms of gene regulation. However, comprehensively performing all such experiments, even across a fixed set of sample and assay types, is often infeasible in practice. A promising alternative to performing experiments exhaustively is to, instead, perform a core set of experiments and subsequently use machine learning methods to impute the remaining experiments. However, questions remain as to the quality of the imputations, the best approaches for performing imputations, and even what performance measures meaningfully evaluate performance of such models. In this work, we address these questions by comprehensively analyzing imputations from 23 imputation models submitted to the ENCODE Imputation Challenge. We find that measuring the quality of imputations is significantly more challenging than reported in the literature, and is confounded by three factors: major distributional shifts that arise because of differences in data collection and processing over time, the amount of available data per cell type, and redundancy among performance measures. Our systematic analyses suggest several steps that are necessary, but also simple, for fairly evaluating the performance of such models, as well as promising directions for more robust research in this area.
Publisher
Cold Spring Harbor Laboratory
|
|