Abstract
PurposeOvarian cancer (OC) is the most common type of gynecologic cancer in the world with a high rate of mortality. Due to manifestation of generic symptoms and absence of specific biomarkers, OC is usually diagnosed at a late stage. Machine learning models can be employed to predict driver genes implicated in causative mutations.Design/methodology/approachIn the present study, a comprehensive next generation sequencing (NGS) analysis of whole exome sequences of 47 OC patients was carried out to identify clinically significant mutations. Nine functional features of 708 mutations identified were input into a machine learning classification model by employing the eXtreme Gradient Boosting (XGBoost) classifier method for prediction of OC driver genes.FindingsThe XGBoost classifier model yielded a classification accuracy of 0.946, which was superior to that obtained by other classifiers such as decision tree, Naive Bayes, random forest and support vector machine. Further, an interaction network was generated to identify and establish correlations with cancer-associated pathways and gene ontology data.Originality/valueThe final results revealed 12 putative candidate cancer driver genes, namely LAMA3, LAMC3, COL6A1, COL5A1, COL2A1, UGT1A1, BDNF, ANK1, WNT10A, FZD4, PLEKHG5 and CYP2C9, that may have implications in clinical diagnosis.
Subject
Library and Information Sciences,Information Systems
Reference67 articles.
1. Machine learning classification and structure-functional analysis of cancer mutations reveal unique dynamic and network signatures of driver sites in oncogenes and tumor suppressor genes;Journal of Chemical Information and Modeling,2018
2. Ovarian Cancer;American Cancer Society,2016
3. Bartz-Beielstein, T., Chandrasekaran, S. and Rehbach, F. (2023), “Case study II: tuning of gradient boosting (xgboost)”, in IDE+A: Institute for Data Science, Engineering, and Analytics (Ed.), Hyperparameter Tuning for Machine and Deep Learning with R: A Practical Guide, Springer Nature Singapore, Singapore, pp. 221-234.
4. Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles;Nucleic Acids Research,2015
5. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data;Cancer Discovery,2012
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献