Tıbbi Verilerde Heinz Ortalamasına Dayalı Yeni Sentetik Veriler Üreterek Veri Kümesini Dengeleme

Author:

GÜMÜŞ İbrahim Halil1,GÜLDAL Serkan2

Affiliation:

1. ADIYAMAN UNIVERSITY

2. Adıyaman Üniversitesi

Abstract

Advances in science and technology have caused data sizes to increase at a great rate. Thus, unbalanced data has arisen. A dataset is unbalanced if the classes are not nearly equally represented. In this case, classifying the data causes performance values to decrease because the classification algorithms are developed on the assumption that the datasets are balanced. As the accuracy of the classification favors the majority class, the minority class is often misclassified. The majority of datasets, especially those used in the medical field, have an unbalanced distribution. To balance this distribution, several studies have been performed recently. These studies are undersampling and oversampling processes. In this study, distance and mean based resampling method is used to produce synthetic samples using minority class. For the resampling process, the closest neighbors for all data points belonging to the minority class were determined by using the Euclidean distance. Based on these neighbors and using the Heinz Mean, the desired number of new synthetic samples were formed between each sample to obtain balance. The Random Forest (RF) and Support Vector Machine (SVM) algorithms are used to classify the raw and balanced datasets, and the results were compared. Additionally, the other well known methods (Random Over Sampling-ROS, Random Under Sampling-RUS, and Synthetic Minority Oversampling TEchnique-SMOTE) are compared with the proposed method. It was shown that the balanced dataset using the proposed resampling method increases classification efficiency as compared to the raw dataset and other methods. Accuracy measurements of RF are 0.751 and 0.799 and, accuracy measurements of SVM are 0.762 and 0.781 for raw data and resampled data respectively. Likewise, there are improvements in the other metrics such as Precision, Recall, and F1 Score.

Publisher

Afyon Kocatepe Universitesi Fen Ve Muhendislik Bilimleri Dergisi

Subject

General Engineering

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3