Author:
Chen Baiming,Huang Hsi-Yuan
Abstract
AbstractMicroRNAs (miRNAs) regulate gene expression by binding to mRNAs, inhibiting translation, or promoting mRNA degradation. miRNAs are of great importance in the development of diseases. Currently, a variety of miRNA target prediction tools are available, which analyze sequence complementarity, thermodynamic stability, and evolutionary conservation to predict miRNA-target interactions (MTIs) within the 3’ untranslated region (3’UTR). We propose a further screening method for sequence-based predicted MTIs by considering the disease semantic similarity between miRNA and gene to establish a prediction database of disease-specific MTIs. We fine-tuned a Sentence-BERT model to calculate disease semantic similarity. The method achieves a recall of 91% on the validation dataset, which comprises intersecting MTIs from miR-TarBase and miRWalk that represent experimentally verified predicted MTIs. Additionally, the method demonstrates excellent generalizability across different databases. The proposed method was utilized to calculate the semantic similarity of diseases in 3,900,041 experimentally validated and predicted MTIs, involving 7,727 genes and 1,263 miRNAs. The observed distribution of similarity aligns perfectly with existing biological evidence. The study holds the potential to offer valuable insights into comprehending miRNA-gene regulatory networks and advancing progress in disease diagnosis, treatment, and drug development.
Publisher
Cold Spring Harbor Laboratory