Affiliation:
1. Bharath Institute of Higher Education and Research, India
Abstract
Data-driven problem-solving requires the capacity to use cutting-edge computational methods to explain fundamental phenomena to a large audience. These facilities are needed for political and social studies. Quantitative methods often involve knowledge of concepts, trends, and facts that affect the study programme. Researchers often don't know the data's structure or assumptions when analysing it. Data exploration may also obscure social science research methodology instruction. It was essential applied research before predictive modelling and hypothesis testing. Clustering is part of data mining and picking the right cluster count is key to improving predictive model accuracy for large datasets. Unsupervised machine learning (ML) algorithm K-means is popular. The method usually finds discrete, non-overlapping clusters with groups for each location. It can be difficult to choose the best k-means approach. In the human freedom index (HFI) dataset, the mini batch k-mean (MBK-mean) using the Hamely method reduces iteration and increases cluster efficiency. The silhouette score algorithm from Scikit-learn was used to obtain the average silhouette co-efficient of all samples for various cluster counts. A cluster with fewer negative values is considered best. Additionally, the silhouette with the greatest score has the optimum clusters.