Abstract
AbstractIn the context of biomedical applications, new link prediction algorithms are continuously being developed and these algorithms are typically evaluated computationally, using test sets generated by sampling the edges uniformly at random. However, as we demonstrate, this creates a bias in the evaluation towards “the rich nodes”, i.e., those with higher degrees in the network. More concerningly, we demonstrate that this bias is prevalent even when different snapshots of the network are used for evaluation as recommended in the machine learning community. This leads to a cycle in research where newly developed algorithms generate more knowledge on well-studied biological entities while the under-studied entities are commonly ignored. To overcome this issue, we propose a weighted validation setting focusing on under-studied entities and present strategies to facilitate bias-aware evaluation of link prediction algorithms. These strategies can help researchers gain better insights from computational evaluations and promote the development of new algorithms focusing on novel findings and under-studied proteins. We provide a web tool to assess the bias in evaluation data at:https://yilmazs.shinyapps.io/colipe/
Publisher
Cold Spring Harbor Laboratory
Reference43 articles.
1. Graph embedding on biomedical networks: methods, applications and evaluations;Bioinformatics,2020
2. Lrssl: predict and interpret drug–disease associations based on data integration using sparse subspace learning;Bioinformatics,2017
3. Drug response prediction as a link prediction problem;Scientific reports,2017
4. Da da: degree-aware algorithms for network-based disease gene prioritization;BioData mining,2011
5. Manifold regularized matrix factorization for drug-drug interaction prediction;Journal of biomedical informatics,2018