Benchmarking the negatives: Effect of negative data generation on the classification of miRNA-mRNA interactions-Reference-Cited by-同舟云学术

Benchmarking the negatives: Effect of negative data generation on the classification of miRNA-mRNA interactions

Published:2024-08-26 Issue:8 Volume:20 Page:e1012385
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Cohen-Davidi Efrat,Veksler-Lublinsky Isana^ORCID

Abstract

MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally. In animals, this regulation is achieved via base-pairing with partially complementary sequences on mainly 3’ UTR region of messenger RNAs (mRNAs). Computational approaches that predict miRNA target interactions (MTIs) facilitate the process of narrowing down potential targets for experimental validation. The availability of new datasets of high-throughput, direct MTIs has led to the development of machine learning (ML) based methods for MTI prediction. To train an ML algorithm, it is beneficial to provide entries from all class labels (i.e., positive and negative). Currently, no high-throughput assays exist for capturing negative examples. Therefore, current ML approaches must rely on either artificially generated or inferred negative examples deduced from experimentally identified positive miRNA-target datasets. Moreover, the lack of uniform standards for generating such data leads to biased results and hampers comparisons between studies. In this comprehensive study, we collected methods for generating negative data for animal miRNA–target interactions and investigated their impact on the classification of true human MTIs. Our study relies on training ML models on a fixed positive dataset in combination with different negative datasets and evaluating their intra- and cross-dataset performance. As a result, we were able to examine each method independently and evaluate ML models’ sensitivity to the methodologies utilized in negative data generation. To achieve a deep understanding of the performance results, we analyzed unique features that distinguish between datasets. In addition, we examined whether one-class classification models that utilize solely positive interactions for training are suitable for the task of MTI classification. We demonstrate the importance of negative data in MTI classification, analyze specific methodological characteristics that differentiate negative datasets, and highlight the challenge of ML models generalizing interaction rules from training to testing sets derived from different approaches. This study provides valuable insights into the computational prediction of MTIs that can be further used to establish standards in the field.

Funder

Israel Science Foundation

Publisher

Public Library of Science (PLoS)

Reference55 articles.

1. miRBase: annotating high confidence microRNAs using deep sequencing data;A Kozomara;Nucleic acids research,2013

2. MicroRNA biogenesis: regulating the regulators;EF Finnegan;Critical reviews in biochemistry and molecular biology,2013

3. Gene silencing by microRNAs: contributions of translational repression and mRNA decay;E Huntzinger;Nature Reviews Genetics,2011

4. The evolutionary origin of plant and animal microRNAs;Y Moran;Nature ecology & evolution,2017

5. MicroRNA therapeutics: towards a new era for the management of cancer and other diseases;R Rupaimoole;Nature reviews Drug discovery,2017