Affiliation:
1. Department of Petroleum Engineering, Federal University of Technology, Owerri, FUTO, Owerri, Imo State
Abstract
Abstract
Lithology classification in geothermal exploration has been of great significance in the understanding of subsurface geology and geophysics, which can enhance the exploration and exploitation of geothermal resources. Alongside other known industrial means of classifying lithologies, the application of machine learning models has shown viable prospects in this regard. However, there seems to be poor accuracy in the performance of some of these models due to class imbalance associated with the lithologies to be classified. Hence, in this study, robust class imbalance handling techniques were investigated to efficiently classify lithology in a geothermal field. The investigated techniques which involved Synthetic Minority Oversampling Technique (SMOTE), Random Oversampling (RO), Random Undersampling (RU), and the Near Miss Undersampling (NMU) Techniques, were each employed with two ensemble bagging methods; Random Forest Classifier (RFC) and Balanced Bagging Classifier (BBC). F1 score was the key evaluation metric, as it considers both precision and recall, giving a more comprehensive picture of the models’ performance. It was observed that by leveraging real-time drilling data such as mud flow in, rate of penetration (ROP), surface torque, pump pressure and rotary speed as input parameters, RFC performed better with the resampling techniques than BBC did. Moreover, RFC combined with RU greatly outperformed other combination techniques in the prediction of the geothermal lithology with an F1 score of 93.6% for the minority class (Plutonic) and 99.3% for the majority class (Alluvium) on the testing dataset, while other combinations had F1 scores of less than 37%. This solution alongside other vital insights from this study, showed that class imbalance handling techniques can be efficiently adopted towards building more robust machine learning models for geothermal resource exploration with prevailing high temperature and unfavorable subsurface conditions that limit the use of known traditional methods.
Reference34 articles.
1. SMOTE and Nearmiss Methods for Disease Classification with Unbalanced Data;Alamsyah;Proceedings of The International Conference on Data Science and Official Statistics,2022
2. The Challenge of Correcting Bottom-Hole Temperatures - An Example from FORGE 58-32, near Milford, Utah;Allis;43rd Workshop on Geothermal Reservoir Engineering,2018
3. Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction;Artetxe;Neural Computing and Applications,2020
4. Principal component analysis (PCA) based hybrid models for the accurate estimation of reservoir water saturation;Asante-Okyere;Computers and Geosciences,2020
5. Learning from imbalanced data using methods of sample selection;Chairi;Proceedings of 2012 International Conference on Multimedia Computing and Systems, ICMCS 2012,2012