Affiliation:
1. MRC Human Genetics Unit, Institute of Genetics and Cancer University of Edinburgh Edinburgh UK
Abstract
AbstractThe assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top‐performing VEPs are unsupervised methods including EVE, DeepSequence and ESM‐1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.
Funder
Lister Institute of Preventive Medicine
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computational Theory and Mathematics,General Agricultural and Biological Sciences,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,Information Systems
Cited by
38 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献