Expanding and Enriching the LncRNA Gene–Disease Landscape Using the GeneCaRNA Database
-
Published:2024-06-12
Issue:6
Volume:12
Page:1305
-
ISSN:2227-9059
-
Container-title:Biomedicines
-
language:en
-
Short-container-title:Biomedicines
Author:
Aggarwal Shalini1, Rosenblum Chana1, Gould Marshall2ORCID, Ziman Shahar1, Barshir Ruth3ORCID, Zelig Ofer4, Guan-Golan Yaron4, Iny-Stein Tsippi1, Safran Marilyn1, Pietrokovski Shmuel1, Lancet Doron1ORCID
Affiliation:
1. Department of Molecular Genetics, Weizmann Institute of Science, Herzl 234, Rehovot 7610010, Israel 2. Department of Biological Sciences, University College London, Gower Street, London WC1E 6BT, UK 3. TAD Center for AI and Data Science, Tel Aviv University, Tel Aviv 6997801, Israel 4. LifeMap Sciences Inc., Alameda, CA 94501, USA
Abstract
The GeneCaRNA human gene database is a member of the GeneCards Suite. It presents ~280,000 human non-coding RNA genes, identified algorithmically from ~690,000 RNAcentral transcripts. This expands by ~tenfold the ncRNA gene count relative to other sources. GeneCaRNA thus contains ~120,000 long non-coding RNAs (LncRNAs, >200 bases long), including ~100,000 novel genes. The latter have sparse functional information, a vast terra incognita for future research. LncRNA genes are uniformly represented on all nuclear chromosomes, with 10 genes on mitochondrial DNA. Data obtained from MalaCards, another GeneCards Suite member, finds 1547 genes associated with 1 to 50 diseases. About 15% of the associations portray experimental evidence, with cancers tending to be multigenic. Preliminary text mining within GeneCaRNA discovers interactions of lncRNA transcripts with target gene products, with 25% being ncRNAs and 75% proteins. GeneCaRNA has a biological pathways section, which at present shows 131 pathways for 38 lncRNA genes, a basis for future expansion. Finally, our GeneHancer database provides regulatory elements for ~110,000 lncRNA genes, offering pointers for co-regulated genes and genetic linkages from enhancers to diseases. We anticipate that the broad vista provided by GeneCaRNA will serve as an essential guide for further lncRNA research in disease decipherment.
Funder
European Network of Excellence for Big Data in Prostate Cancer, Horizon 2020
Reference31 articles.
1. A Guide to Naming Human Non-Coding RNA Genes;Seal;EMBO J.,2020 2. Gene: A Gene-Centered Information Resource at NCBI;Brown;Nucleic Acids Res.,2015 3. Ensembl 2024;Harrison;Nucleic Acids Res.,2024 4. The RNAcentral Consortium, Sweeney, B.A., Petrov, A.I., Burkov, B., Finn, R.D., Bateman, A., Szymanski, M., Karlowski, W.M., Gorodkin, J., and Seemann, S.E. (2019). RNAcentral: A Hub of Information for Non-Coding RNA Sequences. Nucleic Acids Res., 47, D221–D229. 5. Barshir, R., Fishilevich, S., Iny-Stein, T., Zelig, O., Mazor, Y., Guan-Golan, Y., Safran, M., and Lancet, D. (2021). GeneCaRNA: A Comprehensive Gene-Centric Database of Human Non-Coding RNAs in the GeneCards Suite. J. Mol. Biol., 433.
|
|