Abstract
Introduction/purpose: The utilization of machine learning methods has become indispensable in analyzing large-scale, complex data in contemporary data-driven environments, with a diverse range of applications from optimizing business operations to advancing scientific research. Despite the potential for insight and innovation presented by these voluminous datasets, they pose significant challenges in areas such as data quality and structure, necessitating the implementation of effective management strategies. Machine learning techniques have emerged as essential tools in identifying and mitigating these challenges and developing viable solutions to address them. The MNIST dataset represents a prominent example of a widely-used dataset in this field, renowned for its expansive collection of handwritten numerical digits, and frequently employed in tasks such as classification and analysis, as demonstrated in the present study. Methods: This study employed the MNIST dataset to investigate various statistical techniques, including the Principal Components Analysis (PCA) algorithm implemented using the Python programming language. Additionally, Support Vector Machine (SVM) models were applied to both linear and non-linear classification problems to assess the accuracy of the model. Results: The results of the present study indicate that while the PCA technique is effective for dimensionality reduction, it may not be as effective for visualization purposes. Moreover, the findings demonstrate that both linear and non-linear SVM models were capable of effectively classifying the dataset. Conclusion: The findings of the study demonstrate that SVM can serve as an efficacious technique for addressing classification problems.
Publisher
Centre for Evaluation in Education and Science (CEON/CEES)
Reference17 articles.
1. Abdi, H. & Williams, L.J. 2010. Principal component analysis. WIREs (Wiley Interdisciplinary Reviews), 2(4), pp.433-459. Available at: https://doi.org/10.1002/wics.101;
2. Ahmed, A.H., Al-Hamadani, M.N.A. & Abdulrahman Satam, I. 2022. Prediction of COVID-19 disease severity using machine learning techniques. Bulletin of Electrical Engineering and Informatics, 11(2), pp.1069-1074. Available at: https://doi.org/10.11591/eei.v11i2.3272;
3. Al-Hamadani, M.N.A. 2015. Evaluation of the Performance of Deep Learning Techniques Over Tampered Dataset. Master thesis. Greensboro, North Carolina, USA: The University of North Carolina, Faculty of The Graduate School [online]. Available at: https://www.proquest.com/openview/769d2aa550c12fcf40655405e8df7689/1?pq -origsite=gscholar&cbl=18750 [Accessed: 05 February 2023];
4. Guenther, N. & Schonlau, M. 2016. Support Vector Machines. The Stata Journal, 16(4), pp.917-937. Available at: https://doi.org/10.1177/1536867X1601600407;
5. Hao, J. & Ho, T.K. 2019. Machine Learning Made Easy: A Review of Scikitlearn Package in Python Programming Language. Journal of Educational and Behavioral Statistics, 44(3), pp.348-361. Available at: https://doi.org/10.3102/1076998619832248;
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献