LitGene: a transformer-based model that uses contrastive learning to integrate textual information into gene representations-Reference-Cited by-同舟云学术

LitGene: a transformer-based model that uses contrastive learning to integrate textual information into gene representations

Published:2024-08-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Jararweh Ala,Macaulay Oladimeji,Arredondo David,Oyebamiji Olufunmilola M,Hu Yue,Tafoya Luis,Zhang Yanfu,Virupakshappa Kushal,Sahu Avinash

Abstract

AbstractRepresentation learning approaches leverage sequence, expression, and network data, but utilize only a fraction of the rich textual knowledge accumulated in the scientific literature. We present LitGene, an interpretable transformer-based model that refines gene representations by integrating textual information. The model is enhanced through a Contrastive Learning (CL) approach that identifies semantically similar genes sharing a Gene Ontology (GO) term. LitGene demonstrates accuracy across eight benchmark predictions of protein properties and robust zero-shot learning capabilities, enabling the prediction of new potential disease risk genes in obesity, asthma, hypertension, and schizophrenia. LitGene’s SHAP-based interpretability tool illuminates the basis for identified disease-gene associations. An automated statistical framework gauges literature support for AI biomedical predictions, providing validation and improving reliability. LitGene’s integration of textual and genetic information mitigates data biases, enhances biomedical predictions, and promotes ethical AI practices by ensuring transparent, equitable, open, and evidence-based insights. LitGene code is open source and also available for use via a public web interface atlitgene.avisahuai.com.

Publisher

Cold Spring Harbor Laboratory

Reference97 articles.

1. Bruce Alberts . Molecular biology of the cell. Garland science, 2017.

2. DeepLoc: prediction of protein subcellular localization using deep learning

3. Sultan Alrowili and Vijay Shanker . BioM-transformers: Building large biomedical language models with BERT, ALBERT and ELECTRA. In Dina Demner-Fushman , Kevin Bretonnel Cohen , Sophia Ananiadou , and Junichi Tsujii , editors, Proceedings of the 20th Workshop on Biomedical Language Processing, pages 221–227, Online, June 2021. Association for Computational Linguistics.

4. Emily Alsentzer , John Murphy , William Boag , Wei-Hung Weng, Di Jindi , Tristan Naumann , and Matthew McDermott . Publicly available clinical BERT embeddings. In Anna Rumshisky , Kirk Roberts , Steven Bethard , and Tristan Naumann , editors, Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics.

5. The trait structure of cloze test scores;Tesol Quarterly,1982