Affiliation:
1. Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia
Abstract
This paper deals with the problem of diagnosing oncological diseases based on blood protein markers. The goal of the study is to develop a novel approach in decision-making on diagnosing oncological diseases based on blood protein markers by generating datasets that include various combinations of features: both known features corresponding to blood protein markers and new features generated with the help of mathematical tools, particularly with the involvement of the non-linear dimensionality reduction algorithm UMAP, formulas for various entropies and fractal dimensions. These datasets were used to develop a group of multiclass kNN and SVM classifiers using oversampling algorithms to solve the problem of class imbalance in the dataset, which is typical for medical diagnostics problems. The results of the experimental studies confirmed the feasibility of using the UMAP algorithm and approximation entropy, as well as Katz and Higuchi fractal dimensions to generate new features based on blood protein markers. Various combinations of these features can be used to expand the set of features from the original dataset in order to improve the quality of the received classification solutions for diagnosing oncological diseases. The best kNN and SVM classifiers were developed based on the original dataset augmented respectively with a feature based on the approximation entropy and features based on the UMAP algorithm and the approximation entropy. At the same time, the average values of the metric MacroF1-score used to assess the quality of classifiers during cross-validation increased by 16.138% and 4.219%, respectively, compared to the average values of this metric in the case when the original dataset was used in the development of classifiers of the same name.
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference72 articles.
1. (2023, January 03). Global Health Care Outlook. Available online: https://www2.deloitte.com/cn/en/pages/life-sciences-and-healthcare/articles/2021-global-healthcare-outlook.html.
2. Biomarker Studies in Early Detection and Prognosis of Breast Cancer;Li;Adv. Exp. Med. Biol.,2017
3. The future of blood-based biomarkers for the early detection of breast cancer;Loke;Eur. J. Cancer.,2018
4. Detection and localization of surgically resectable cancers with a multi-analyte blood test;Cohen;Science,2018
5. CancerSEEK and destroy—a blood test for early cancer detection;Killock;Nat. Rev. Clin. Oncol.,2018
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献