Author:
Hossain Sk Md Mosaddek,Khatun Lutfunnesa,Ray Sumanta,Mukhopadhyay Anirban
Abstract
AbstractClassifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. In our present work, we have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms, viz. SVM with the linear kernel (SVM-Lin), SVM with radial basis function kernel (SVM-RBF), random forest (RF), k-nearest neighbours (kNN), and decision trees (DT). The PC-RMTL achieves 96.07% accuracy and 95.80% MCC score for a completely unknown independent test set. The only method that appears as the real competitor is SVM-Lin, which nearly equalizes the accuracy in prediction of PC-RMTL but only when complete feature sets are provided for training; otherwise, PC-RMTL outperformed all other classification models. To the best of our knowledge, this is a significant improvement over all the existing works in pan-cancer classification as they have failed to classify many cancer types from one another reliably. We have also compared gene expression patterns of the top discriminating genes across the cancers and performed their functional enrichment analysis that uncovers several interesting facts in distinguishing pan-cancer samples.
Publisher
Springer Science and Business Media LLC
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献