Using Artificial Intelligence to select drug targets in oncology

Author:

,Oprea Tudor,Păunescu Virgil,

Abstract

For decades, scientists have approached cancer as a disease of the genome (1). Efforts to collect multi-faceted, heterogeneous data such as tissue-based somatic mutations (2) and cancer cell line expression and perturbation (3), have contributed to breakthroughs such as the Hallmarks of Cancer (4,5) and The Cancer Genome Atlas (TCGA) (6). These efforts have framed our understanding of cancer at the molecular level and laid the foundational roadmap for drug target identification in oncology. The therapeutic management of cancer, an out-of-control process of cellular proliferation and dissemination, typically aims to selectively inhibit specific molecules or pathways crucial for tumor growth and survival (7). Targeting specific mutations, such as BRAF V600E and KRAS G12C, has resulted in clinically successful treatments for melanoma (e.g., vemurafenib as BRAF inhibitor) and non-small cell lung carcinoma (e.g., sotorasib as KRAS inhibitor) (8). Target selection is a critical step in pharmaceutical research and development, as it remains the major driver for therapeutic efficacy and patient safety. As outlined elsewhere (8), target selection starts from identifying tumor-specific actionable mutations via NGS (Next-Generation Sequencing). This nucleic acid sequencing technology identifies common and rare genetic aberrations in cancer. Through sequential oligonucleotide capture, amplification, and NGS, pointof- care diagnostic tools further support this process through mutational evaluation. In addition to patient-derived clinical data, pan-cancer analyses, and biomedical literature are frequently used to understand molecular pathways affected by specific mutations, further guiding therapeutic target selection. Functional genomics (9), genome-wide association studies (GWAS), and polygenic scores (10) are increasingly incorporated in clinical model assessments of cancer therapeutic targets. Despite the widespread usage of these methodologies, several limitations have become apparent. First, cancer is a complex disease, with a subtle interplay between the environmental and genetic factors concerning tumor growth and survival. Intra-tumor heterogeneity studies improve our understanding of the evolutionary forces driving subclonal selection (11), whereas genetic (clonal) and non-genetic adaptive reprogramming events can explain primary and secondary drug resistance in cancer (12). Furthermore, elucidating the exact mechanism of action (MoA) drug targets in cancer is not trivial, as many anti-cancer drugs continue to exhibit tumoricidal activity even after the (suspected) MoA targets have been knocked out (13). Indeed, offtarget effects often compound biological phenotype interpretation (e.g., loss of cell viability or slowing tumor growth) (14). Against this backdrop, large-scale data integration coupled with artificial intelligence and machine learning (AIML) (15) can improve target selection in oncology. AIML technologies can rapidly process a diverse set of oncology-related resources such as TCGA (6), COSMIC (2), DepMap (16), and others by coalescing large datasets into a seamlessly integrated platform. This is particularly true if large language models (LLMs) such as GPT-4 (17) are incorporated intothe data ingestion workflow. From genomic and transcriptomic data to realworld evidence, AIML can sift through layers of evidence and produce models faster than traditional methods. This potential efficiency increase and the ability to develop multiple parallel models can offer testable hypotheses. The ability to integrate and analyze vast datasets with AIML techniques holds promise for uncovering novel insights and therapeutic targets in various fields of medicine. By leveraging these AIML advancements, these technologies can be applied to most complex diseases, not just oncology. For instance, neurodegenerative diseases like Alzheimer's disease present similar challenges due to their multifactorial nature and the interplay between genetic and environmental factors. Recognizing the potential of AIML in complex disease biology modeling, we integrated a set of 17 different resources focused on expression data, pathways, functional terms, and phenotypic information with XGBoost (18), an optimized gradient boosting (machine learning) algorithm, and Metapath (19), a feature-extraction technique, to seek novel genes associated with Alzheimer’s disease (20). Of the top-20 ML-predicted genes previously not associated with Alzheimer’s pathology, five were experimentally confirmed using multiple methods. The same set of integrated resources, combined with MetaPath and XGBoost, resulted in the temporally validated identification of seven top-20 and two bottom-20 genes associated with autophagy (21). Building on our success in Alzheimer’s and autophagy research, we used this integrated approach (the above dataset and algorithms) to develop 41 distinct blood cancer AIML models starting from primary tumor type and histology (22). We contrasted 725 cancer-specific genes curated in the COSMIC cancer gene census, serving as the positive set, with 440 manually curated housekeeping genes that served as the negative set. The 41 AIML models identified the expected “frequent hitters,” such as GAPDH, AKT1, HRAS, TLR4, and TP53, all having wellunderstood roles in cancer. Other genes, such as IRAK3, EPHB1, ITPKB, ACVR2B, and CAMK2D, were predicted to be relevant in 10 or more hematology/oncology malignancies. In contrast, some genes were associated with just one cancer: For example, LPAR5, GPR18, and FCER2 are predicted to be relevant only in primary bone diffuse large B cell lymphoma (22). Cell-based validation studies for some of these genes are ongoing. Although AI-based target selection in oncology primarily relies on gene-phenotype association models, it also offers other potential applications: 1) processing oncology biomarkers for therapeutic targeting; 2) enhancing the understanding of gene variants of uncertain significance (VUS) through in-depth context and real-world evidence; and 3) improving animal and preclinically validated model interpretation by incorporating human pathology and physiology. Challenges and limitations of AIML technologies include: 1) data and information quality, where the maxim “garbage in, garbage out” underscores the importance of data veracity; 2) model interpretability, which is increasingly addressed through “explainable AI” to ensure that AIML models can be interpreted by humans and can aid decision-making in research and clinical development; and 3) awareness of data bias and leakage as well as ethical considerations, to prevent discriminatory practices and ensure fairness in model development. The future of target selection in oncology is likely to incorporate AIML technologies. By processing vast datasets more rapidly and efficiently and by offering enhanced context for gene VUS, somatic mutations, and biomolecular pathways, AIML models are poised to improve target identification and validation for common and rare cancers.

Publisher

Asociatia Societatea Transdisciplinara de Oncologie Personalizata Pentru Combaterea Cancerului - Stop Cancer

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3