Characterizing uncertainty in predictions of genomic sequence-to-activity models-Reference-Cited by-同舟云学术

Characterizing uncertainty in predictions of genomic sequence-to-activity models

Published:2023-12-23 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Bajwa Ayesha,Rastogi Ruchir^ORCID,Kathail Pooja^ORCID,Shuai Richard W.,Ioannidis Nilah M.^ORCID

Abstract

AbstractGenomic sequence-to-activity models are increasingly utilized to understand gene regulatory syntax and probe the functional consequences of regulatory variation. Current models make accurate predictions of relative activity levels across the human reference genome, but their performance is more limited for predicting the effects of genetic variants, such as explaining gene expression variation across individuals. To better understand the causes of these shortcomings, we examine the uncertainty in predictions of genomic sequence-to-activity models using an ensemble of Basenji2 model replicates. We characterize prediction consistency on four types of sequences: reference genome sequences, reference genome sequences perturbed with TF motifs, eQTLs, and personal genome sequences. We observe that models tend to make high-confidence predictions on reference sequences, even when incorrect, and low-confidence predictions on sequences with variants. For eQTLs and personal genome sequences, we find that model replicates make inconsistent predictions in >50% of cases. Our findings suggest strategies to improve performance of these models.

Publisher

Cold Spring Harbor Laboratory

Reference37 articles.

1. Predicting effects of noncoding variants with deep learning–based sequence model;Nature methods,2015

2. Vikram Agarwal and Jay Shendure . Predicting mrna abundance directly from genomic sequence using deep convolutional neural networks. Cell reports, 31(7), 2020.

3. Cross-species regulatory sequence activity prediction

4. Effective gene expression prediction from sequence by integrating long-range interactions