Abstract
AbstractThere has been much effort to prioritize genomic variants with respect to their impact on “function”. However, function is often not precisely defined: Sometimes, it is the disease association of a variant; other times, it reflects a molecular effect on transcription or epigenetics. Here we coupled multiple genomic predictors to build GRAM, a generalized model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant in a cell-specific manner. As a first step, we performed feature engineering: using a LASSO regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other functional-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating SELEX features and expression profiles. The program combines a universal regulatory score for a variant in a non-coding element with a modifier score reflecting the particular cell type. We benchmarked GRAM on a large-scale MPRA dataset in the GM12878 cell line, achieving a ROC score of ∼0.73; performance on the K562 cell line was similar. Finally, we evaluated the performance of GRAM on targeted regions using luciferase assays in MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gives very different results, highlighting the importance of carefully defining the functional target the model is predicting.Author SummaryNoncoding variants lie outside of protein-coding regions, and are found to have disease associations. However, knowledge on the molecular effect of these non-coding variants in a cell-specific context is very limited. Also, different output between multiple experiment platforms may introduce extra complexity in analyzing the molecular function of these variants. We developed GRAM, a generalized model to predict molecular effect of non-coding variants in multiple cell types for different experimental platforms. We first selected the most informative cell-independent SELEX transcription factor binding score on the variant locus as features and then combine cell-specific gene expression profile to build a multi-step prediction model. GRAM has been successfully tested on both MPRA and Luciferase assay, and on three different cell lines: GM12878, K562 and MCF7, shows high performance.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献