High-throughput deep learning variant effect prediction with Sequence UNET-Reference-Cited by-同舟云学术

High-throughput deep learning variant effect prediction with Sequence UNET

Published:2022-05-24 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Dunham Alistair S.^ORCID,Beltrao Pedro^ORCID,AlQuraishi Mohammed^ORCID

Abstract

AbstractUnderstanding the consequences of protein coding mutations is important for many applications in biology and medicine. The vast number of possible mutations across species makes comprehensive experimental characterisation impossible, even with recent high-throughput techniques, which means computationally predicting the consequences of variation is essential for many analyses. Previous variant effect prediction (VEP) tools, generally based on evolutionary conservation and protein structure, are often computationally intensive, making them difficult to scale and limiting potential applications. Recent developments in deep learning techniques, including protein language models, and biological data scale have led to a new generation of predictors. These models have improved prediction performance but are still often intensive to run because of slow training steps, hardware requirements and large model sizes. In this work we introduce a new highly scalable deep learning architecture, Sequence UNET, that classifies and predicts variant frequency directly from protein sequence. This model learns to build representations of protein sequence features at a range of scales using a fully convolutional U-shaped compression/expansion architecture. We show that it can generalise to pathogenicity prediction, achieving comparable performance on ClinVar to methods including EVE and ESM-1b at greatly reduced computational cost. We further demonstrate its scalability by analysing the consequences of 8.3 billion variants in 904,134 proteins detected in a large-scale proteomics analysis, showing a link between conservation and protein abundance. Sequence UNET can be run on modest hardware through an easy to use Python package.

Publisher

Cold Spring Harbor Laboratory

Reference41 articles.

1. Deep mutational scanning: a new style of protein science

2. SIFT missense predictions for genomes

3. The EVcouplings Python framework for coevolutionary sequence analysis

4. Determinants of protein function revealed by combinatorial entropy optimization

5. The FoldX web server: an online force field

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Domain loss enables evolution of novel functions in a gene superfamily, including snake 3-finger toxins;2022-12-15

2. Nearest neighbor search on embeddings rapidly identifies distant protein relations;Frontiers in Bioinformatics;2022-11-17

3. Nearest neighbor search on embeddings rapidly identifies distant protein relations;2022-09-05