Graph Convolutional Networks for Predicting Cancer Outcomes and Stage: A Focus on cGAS-STING Pathway Activation
-
Published:2024-09-11
Issue:3
Volume:6
Page:2033-2048
-
ISSN:2504-4990
-
Container-title:Machine Learning and Knowledge Extraction
-
language:en
-
Short-container-title:MAKE
Author:
Sokač Mateo1ORCID, Skračić Borna1, Kučak Danijel1ORCID, Mršić Leo2ORCID
Affiliation:
1. Software Engineering Department, Algebra University, 10000 Zagreb, Croatia 2. Algebra University, 10000 Zagreb, Croatia
Abstract
The study presented in this paper evaluated gene expression profiles from The Cancer Genome Atlas (TCGA). To reduce complexity, we focused on genes in the cGAS–STING pathway, crucial for cytosolic DNA detection and immune response. The study analyzes three clinical variables: disease-specific survival (DSS), overall survival (OS), and tumor stage. To effectively utilize the high-dimensional gene expression data, we needed to find a way to project these data meaningfully. Since gene pathways can be represented as graphs, a novel method of presenting genomics data using graph data structure was employed, rather than the conventional tabular format. To leverage the gene expression data represented as graphs, we utilized a graph convolutional network (GCN) machine learning model in conjunction with the genetic algorithm optimization technique. This allowed for obtaining an optimal graph representation topology and capturing important activations within the pathway for each use case, enabling a more insightful analysis of the cGAS–STING pathway and its activations across different cancer types and clinical variables. To tackle the problem of unexplainable AI, graph visualization alongside the integrated gradients method was employed to explain the GCN model’s decision-making process, identifying key nodes (genes) in the cGAS–STING pathway. This approach revealed distinct molecular mechanisms, enhancing interpretability. This study demonstrates the potential of GCNs combined with explainable AI to analyze gene expression, providing insights into cancer progression. Further research with more data is needed to validate these findings.
Funder
Algebra University
Reference60 articles.
1. Satam, H., Joshi, K., Mangrolia, U., Waghoo, S., Zaidi, G., Rawool, S., Thakare, R.P., Banday, S., Mishra, A.K., and Das, G. (2023). Next-Generation Sequencing Technology: Current Trends and Advancements. Biology, 12. 2. Next-generation sequencing technologies for gene expression profiling in plants;Jain;Briefings Funct. Genom.,2011 3. Muir, P., Li, S., Lou, S., Wang, D., Spakowicz, D.J., Salichos, L., Zhang, J., Weinstock, G.M., Isaacs, F., and Rozowsky, J. (2016). The real cost of sequencing: Scaling computation to keep pace with data generation. Genome Biol., 17. 4. (2024, February 16). Cost of NGS. Available online: https://emea.illumina.com/science/technology/next-generation-sequencing/beginners/ngs-cost.html. 5. DNA storage: Research landscape and future prospects;Dong;Natl. Sci. Rev.,2020
|
|