Author:
Satyajit Uparkar ,Dhote Sunita,Pathan Shabana,Shobhane Purushottam,Das Debasis
Abstract
The primary issue in data analysis is scalability of data mining methods. Various scaling options have been explored in prior research to overcome this problem. Several scaling strategies are explored and tested on various datasets in this research. The cascade scaling method is proposed to improve the efficacy of existing methods. The proposed method starts with gathering a huge dataset and then pre- processed. Once the dataset has undergone pre-processing, it is spitted into smaller subsets of equal size to apply a data mining strategy on each subset. The outcomes of the data mining approach on all subsets are pooled and aggregated for the final results. The accuracy of the given algorithm is used to evaluate its performance. The proposed method and existing methods are evaluated on two health care datasets: PIMA Indian Diabetes and Heart Disease. On the basis of the Data mining methods the proposed scaling approach reflects better results as compared to the existing scaling approaches. On both datasets, the proposed method is compared to previous work published by different authors in earlier studies. It was discovered that the proposed method outperformed previous research. For a few data mining methods, the proposed method achieves 100 percentage accuracy.
Publisher
Perpetual Innovation Media Pvt. Ltd.
Reference14 articles.
1. Bondi, A. B. 2000. Characteristics of scalability and their impact on performance. In Proceedings of the 2nd international workshop on Software and performance. 195–203.
2. Brain, D. and Webb, G. I. 2002. The need for low bias algorithms in classification learning from large data sets. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 62–73.
3. Chang, V., Bailey, J., Xu, Q. A., and Sun, Z. 2022. Pima indians diabetes mellitus classification based on machine learning (ml) algorithms. Neural Computing and Applications, 1–17.
4. Das, D., Goje, N., Uparkar, S., Upadhye, S., and Upasani, M. 2021. Performance analysis of support vector machine algorithms. International Journal of Next-Generation Computing 12, 5.
5. Demidova, L. A. 2021. Two-stage hybrid data classifiers based on svm and knn algorithms. Symmetry 13, 4, 615.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Analytics of Epidemiological Data using Machine Learning Models;International Journal of Next-Generation Computing;2023-02-15