Comparative analysis of weka-based classification algorithms on medical diagnosis datasets

Author:

Dou Yifeng12,Meng Wentao12

Affiliation:

1. Network Information Center, Tianjin Baodi Hospital, Tianjin, China

2. Baodi Clinical College, Tianjin Medical University, Tianjin, China

Abstract

BACKGROUND: With the advent of 5G and the era of Big Data, the rapid development of medical information technology around the world, the massive application of electronic medical records and cases, and the digitization of medical equipment and instruments, a large amount of data has accumulated in the database system of hospitals, which includes clinical diagnosis data and hospital management data. OBJECTIVE: This study aimed to examine the classification effects of different machine learning algorithms on medical datasets so as to better explore the value of machine learning methods in aiding medical diagnosis. METHODS: The classification datasets of four different medical fields in the University of California Irvine machine learning database were used as the research object. Also, six categories of classification models based on the Bayesian theorem idea, integrated learning idea, and rule-based and tree-based idea were constructed using the Weka platform. RESULTS: The between-group experiments showed that the Random Forest algorithm achieved the best results on the Indian liver disease patient dataset (ILPD), delivery cardiotocography (CADG), and lymphatic tractography (LYMP) datasets, followed by Bagging and partition and regression tree. In the within-group algorithm comparison experiments, the Bagging algorithm achieved better results than other algorithms based on the integration idea for 11 metrics on all datasets, mainly focusing on 2 binary datasets. Logit Boost had only 7 metrics with significant performance, and the best algorithm was Rotation Forest, with 28 metrics achieving optimal values. Among the algorithms based on tree ideas, the logistic model tree algorithm achieved optimal results on all metrics on the mammographic dataset (MAGR). The classification performance of BFTree, J48, and Random Tree was poor on each dataset. The best algorithm was Random Forest on the ILPD, CADG, and LYMP datasets with 27 metrics reaching the optimum. CONCLUSION: Machine learning algorithms have good application value in disease prediction and can provide a reference basis for disease diagnosis.

Publisher

IOS Press

Subject

Health Informatics,Biomedical Engineering,Information Systems,Biomaterials,Bioengineering,Biophysics

Reference15 articles.

1. Comparative (Computational) Analysis of the DNA Methylation Status of Trinucleotide Repeat Expansion Diseases;Mohammadmersad;Journal of Nucleic Acids.,2013

2. Supervised DNA barcodes species classification: Analysis, comparisons and results;Weitschek;BioData Mining.,2014

3. Development of a software tool and criteria evaluation for efficient design of small interfering RNA;Chaudhary;Biochemical & Biophysical Research Communications.,2011

4. Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins;Zhang;Computational Biology and Chemistry.,2013

5. Carlos, Fernandez-Lozano, Marcos, et al. Markov mean properties for cell death-related protein classification. Journal of Theoretical Biology. 2014; 349: 12-21.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3