Abstract
AbstractThe clinical adoption of small interfering RNAs (siRNAs) has prompted the development of various computational strategies for siRNA design, from traditional data analysis to advanced machine learning techniques. However, previous studies have inadequately considered the full complexity of the siRNA silencing mechanism, neglecting critical elements such as siRNA positioning on mRNA, RNA base-pairing probabilities, and RNA-AGO2 interactions, thereby limiting the insight and accuracy of existing models. Here, we introducesiRNADesign, a Graph Neural Network (GNN) framework that leverages both non-empirical and empirical rules-based features of siRNA and mRNA to effectively capture the complex dynamics of gene silencing. In multiple internal datasets, siRNADesign achievesstate-of-the-artperformance. Significantly, siRNADesign also outperforms existing methodologies inin vitrowet lab experiments and an externally validated dataset. Additionally, we develop a new data-splitting methodology that addresses the data leakage issue, a frequently overlooked issue in previous studies, ensuring the robustness and stability of our model under various experimental settings. Through rigorous testing, siRNADesign has demonstrated remarkable predictive accuracy and robustness, making significant contributions to the field of gene silencing. Furthermore, our approach in redefining data-splitting standards aims to set new benchmarks for future research in the domain of predictive biological modeling for siRNA.
Publisher
Cold Spring Harbor Laboratory