Comparative analysis of weka-based classification algorithms on medical diagnosis datasets-Reference-Cited by-同舟云学术

Comparative analysis of weka-based classification algorithms on medical diagnosis datasets

Published:2023-04-28 Issue: Volume:31 Page:397-408
ISSN:0928-7329
Container-title:Technology and Health Care
language:
Short-container-title:THC

Author:

Dou Yifeng¹²,Meng Wentao¹²

Affiliation:

1. Network Information Center, Tianjin Baodi Hospital, Tianjin, China

2. Baodi Clinical College, Tianjin Medical University, Tianjin, China

Abstract

BACKGROUND: With the advent of 5G and the era of Big Data, the rapid development of medical information technology around the world, the massive application of electronic medical records and cases, and the digitization of medical equipment and instruments, a large amount of data has accumulated in the database system of hospitals, which includes clinical diagnosis data and hospital management data. OBJECTIVE: This study aimed to examine the classification effects of different machine learning algorithms on medical datasets so as to better explore the value of machine learning methods in aiding medical diagnosis. METHODS: The classification datasets of four different medical fields in the University of California Irvine machine learning database were used as the research object. Also, six categories of classification models based on the Bayesian theorem idea, integrated learning idea, and rule-based and tree-based idea were constructed using the Weka platform. RESULTS: The between-group experiments showed that the Random Forest algorithm achieved the best results on the Indian liver disease patient dataset (ILPD), delivery cardiotocography (CADG), and lymphatic tractography (LYMP) datasets, followed by Bagging and partition and regression tree. In the within-group algorithm comparison experiments, the Bagging algorithm achieved better results than other algorithms based on the integration idea for 11 metrics on all datasets, mainly focusing on 2 binary datasets. Logit Boost had only 7 metrics with significant performance, and the best algorithm was Rotation Forest, with 28 metrics achieving optimal values. Among the algorithms based on tree ideas, the logistic model tree algorithm achieved optimal results on all metrics on the mammographic dataset (MAGR). The classification performance of BFTree, J48, and Random Tree was poor on each dataset. The best algorithm was Random Forest on the ILPD, CADG, and LYMP datasets with 27 metrics reaching the optimum. CONCLUSION: Machine learning algorithms have good application value in disease prediction and can provide a reference basis for disease diagnosis.

Publisher

IOS Press

Subject

Health Informatics,Biomedical Engineering,Information Systems,Biomaterials,Bioengineering,Biophysics

Reference15 articles.

1. Comparative (Computational) Analysis of the DNA Methylation Status of Trinucleotide Repeat Expansion Diseases;Mohammadmersad;Journal of Nucleic Acids.,2013

2. Supervised DNA barcodes species classification: Analysis, comparisons and results;Weitschek;BioData Mining.,2014

3. Development of a software tool and criteria evaluation for efficient design of small interfering RNA;Chaudhary;Biochemical & Biophysical Research Communications.,2011

4. Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins;Zhang;Computational Biology and Chemistry.,2013

5. Carlos, Fernandez-Lozano, Marcos, et al. Markov mean properties for cell death-related protein classification. Journal of Theoretical Biology. 2014; 349: 12-21.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Finding the best predictive model for hypertensive depression in older adults based on machine learning and metabolomics research;Frontiers in Psychiatry;2024-06-27

2. Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseases;Technology and Health Care;2024-05-31

3. Value of magnetic resonance imaging radiomics features in predicting histologic grade of invasive ductal carcinoma of the breast;Technology and Health Care;2024-05-10

4. A machine learning based depression screening framework using temporal domain features of the electroencephalography signals;PLOS ONE;2024-03-27

5. Enhancement of Recommendation Engine Technique for Bug System Fixes;Journal of Advances in Information Technology;2024