Abstract
Given the hazards linked to unstable ground conditions, it is vital to grasp the soil-rock characteristics essential for foundation construction and groundwater development. However, the inherent challenges in geophysics, such as the non-uniqueness of the inverse problem and incomplete subsurface knowledge, hinder the direct interpretation of geophysical data in terms of geological units. Traditional soil exploration methods or relying solely on one geophysical survey method often yield inaccurate results due to limitations in mapping subsurface complexities and heterogeneities. This study addresses these challenges by applying K-means cluster analysis to a univariate geophysical parameter set spanning an 800 m section in the geothermally active Kabota-Tawau area of Sabah, Malaysia. Leveraging unsupervised machine learning techniques like principal component analysis, involving Silhouette and elbow methods, the research determines the optimal number of clusters (k) and validates their accuracy. The analysis identifies four distinct lithologic units, serving as proxies for soil/rock properties in the study area. With an R-squared value nearing 1 and an average Silhouette score of 0.67 for \(k=4\), the results indicate a high level of satisfaction in cluster separation, supported by a percentage sum of square error exceeding 88%. This approach enhances our ability to accurately identify lithologic units critical for improving the reliability of foundation construction and groundwater development efforts.