Abstract
ABSTRACTNon-coding RNAs have increasingly recognized roles in critical molecular mechanisms of disease. However, the non-coding genome ofDrosophila melanogaster, one of the most powerful disease model organisms, has been understudied. Here, we present FLYNC – FLY Non-Coding discovery and classification – a novel machine learning-based model that predicts the probability of a newly identified RNA transcript being a long non-coding RNA (lncRNA). Integrated into an end-to-end bioinformatics pipeline capable of processing single-cell or bulk RNA sequencing data, FLYNC outputs potential new non-coding RNA genes. FLYNC leverages large-scale genomic and transcriptomic datasets to identify patterns and features that distinguish non-coding genes from protein-coding genes, thereby facilitating lncRNA prediction. We demonstrate the application of FLYNC to publicly availableDrosophilaadult head bulk transcriptome and single-cell transcriptomic data fromDrosophilaneural stem cell lineages and identify several novel tissue- and cell-specific lncRNAs. We have further experimentally validated the existence of a set of FLYNC positive hits by qPCR. Overall, our findings demonstrate that FLYNC serves as a robust tool for identifying lncRNAs inDrosophila melanogaster, transcending current limitations in ncRNA identification and harnessing the potential of machine learning.
Publisher
Cold Spring Harbor Laboratory