Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning-Reference-Cited by-同舟云学术

Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning

Published:2021-08-27 Issue:1 Volume:21 Page:
ISSN:1471-2407
Container-title:BMC Cancer
language:en
Short-container-title:BMC Cancer

Author:

Gupta Rajinder,Kleinjans Jos,Caiment Florian

Abstract

Abstract Background Hepatocellular carcinoma (HCC) is one of the leading causes of cancer death in the world owing to limitations in its prognosis. The current prognosis approaches include radiological examination and detection of serum biomarkers, however, both have limited efficiency and are ineffective in early prognosis. Due to such limitations, we propose to use RNA-Seq data for evaluating putative higher accuracy biomarkers at the transcript level that could help in early prognosis. Methods To identify such potential transcript biomarkers, RNA-Seq data for healthy liver and various HCC cell models were subjected to five different machine learning algorithms: random forest, K-nearest neighbor, Naïve Bayes, support vector machine, and neural networks. Various metrics, namely sensitivity, specificity, MCC, informedness, and AUC-ROC (except for support vector machine) were evaluated. The algorithms that produced the highest values for all metrics were chosen to extract the top features that were subjected to recursive feature elimination. Through recursive feature elimination, the least number of features were obtained to differentiate between the healthy and HCC cell models. Results From the metrics used, it is demonstrated that the efficiency of the known protein biomarkers for HCC is comparatively lower than complete transcriptomics data. Among the different machine learning algorithms, random forest and support vector machine demonstrated the best performance. Using recursive feature elimination on top features of random forest and support vector machine three transcripts were selected that had an accuracy of 0.97 and kappa of 0.93. Of the three transcripts, two were protein coding (PARP2–202 and SPON2–203) and one was a non-coding transcript (CYREN-211). Lastly, we demonstrated that these three selected transcripts outperformed randomly taken three transcripts (15,000 combinations), hence were not chance findings, and could then be an interesting candidate for new HCC biomarker development. Conclusion Using RNA-Seq data combined with machine learning approaches can aid in finding novel transcript biomarkers. The three biomarkers identified: PARP2–202, SPON2–203, and CYREN-211, presented the highest accuracy among all other transcripts in differentiating the healthy and HCC cell models. The machine learning pipeline developed in this study can be used for any RNA-Seq dataset to find novel transcript biomarkers. Code: www.github.com/rajinder4489/ML_biomarkers

Publisher

Springer Science and Business Media LLC

Subject

Cancer Research,Genetics,Oncology

Link

https://link.springer.com/content/pdf/10.1186/s12885-021-08704-9.pdf

Reference80 articles.

1. El-Serag HB. Hepatocellular carcinoma. N Engl J Med. 2011;365(12):1118–27. https://doi.org/10.1056/NEJMra1001683.

2. Stewart B, Wild CP: World cancer report 2014. 2014.

3. Kochanek KD, Murphy SL, Xu J, Arias E: Deaths: final data for 2017. 2019.

4. Yang JD, Hainaut P, Gores GJ, Amadou A, Plymoth A, Roberts LR. A global view of hepatocellular carcinoma: trends, risk, prevention and management. Nat Rev Gastroenterol Hepatol. 2019;16(10):589–604. https://doi.org/10.1038/s41575-019-0186-y.

5. Roberts LR, Sirlin CB, Zaiem F, Almasri J, Prokop LJ, Heimbach JK, et al. Imaging for the diagnosis of hepatocellular carcinoma: a systematic review and meta-analysis. Hepatology. 2018;67(1):401–21. https://doi.org/10.1002/hep.29487.

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine-Learning-Based Identification of Key Feature RNA-Signature Linked to Diagnosis of Hepatocellular Carcinoma;Journal of Clinical and Experimental Hepatology;2024-11

2. Integrating Omics Data and AI for Cancer Diagnosis and Prognosis;Cancers;2024-07-03

3. Circulating miRNA’s biomarkers for early detection of hepatocellular carcinoma in Egyptian patients based on machine learning algorithms;Scientific Reports;2024-02-29

4. Decision-Making on the Diagnosis of Oncological Diseases Using Cost-Sensitive SVM Classifiers Based on Datasets with a Variety of Features of Different Natures;Mathematics;2024-02-08

5. Classification of Long Non-Coding RNAs s Between Early and Late Stage of Liver Cancers From Non-coding RNA Profiles Using Machine-Learning Approach;Bioinformatics and Biology Insights;2024-01