Cervical Cancer Diagnostics Using Machine Learning Algorithms and Class Balancing Techniques

Author:

Glučina MatkoORCID,Lorencin Ariana,Anđelić NikolaORCID,Lorencin Ivan

Abstract

Objectives: Cervical cancer is present in most cases of squamous cell carcinoma. In most cases, it is the result of an infection with human papillomavirus or adenocarcinoma. This type of cancer is the third most common cancer of the female reproductive organs. The risk groups for cervical cancer are mostly younger women who frequently change partners, have early sexual intercourse, are infected with human papillomavirus (HPV), and who are nicotine addicts. In most cases, the cancer is asymptomatic until it has progressed to the later stages. Cervical cancer screening rates are low, especially in developing countries and in some minority groups. Due to these facts, the introduction of a tentative cervical cancer screening based on a questionnaire can enable more diagnoses of cervical cancer in the initial stages of the disease. Methods: In this research, publicly available cervical cancer data collected on 859 female patients are used. Each sample consists of 36 input attributes and four different outputs Hinselmann, Schiller, cytology, and biopsy. Due to the significant unbalance of the data set, class balancing techniques were used, and these are the Synthetic Minority Oversampling Technique, the ADAptive SYNthetic algorithm (ADASYN), SMOTEEN, random oversampling, and SMOTETOMEK. To obtain the mentioned target outputs, multiple artificial intelligence (AI) and machine learning (ML) methods are proposed. In this research, multiple classification algorithms such as logistic regression, multilayer perceptron (MLP), support vector machine (SVM), K-nearest neighbors (KNN), and several naive Bayes methods were used. Results: From the achieved results, it can be seen that the highest performances were achieved if MLP and KNN are used in combination with Random oversampling, SMOTEEN, and SMOTETOMEK. Such an approach has resulted in mean area under the receiver operating characteristic curve (AUC¯) and mean Matthew’s correlation coefficient (MCC¯) scores of higher than 0.95, regardless of which diagnostic method was used for output vector construction. Conclusions: According to the presented results, it can be concluded that there is a possibility for the utilization of artificial intelligence (AI) and machine learning (ML) techniques for the development of a tentative cervical cancer screening method, which is based on a questionnaire and an AI-based algorithm. Furthermore, it can be concluded that by using class balancing techniques, a certain performance boost can be achieved.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference54 articles.

1. Cervical cancer;Cohen;Lancet,2019

2. A review of cervical cancer: Incidence and disparities;Buskwofie;J. Natl. Med Assoc.,2020

3. Cervical cancer worldwide;Vu;Curr. Probl. Cancer,2018

4. Cervical cancer;Waggoner;The lancet,2003

5. Cervical cancer: Prevention and treatment;Denny;Discov. Med.,2012

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3