Abstract
Abstract
Choke equations are usually developed for specific conditions, calibrated, and validated with separator data. There is a vast opportunity for data model deployment in this space, which can enhance the accuracy of gas rate predictions further. In this work, a data model approach uses machine learning (ML) techniques to predict gas rate for a given set of measured inputs.
A structured, relational dataset was created from 1,181 wells where full separator setup was utilized. A total of 18 variables such as choke size, flowing wellhead pressure, fluid flow rates, gas-liquid ratios, and PVT properties were captured and subjected to data cleaning, feature engineering with correlation matrices, feature scaling, and other techniques to reduce the number to 11 important variables. A comprehensive ML workflow was utilized with exploratory data analysis using a 65:35 set split for training and holdout sets with a fivefold cross validation and considering feature importance, utilizing seven regression algorithms and a neural net.
Exploratory data analysis led to some data cleaning based on missing values and data QC, leaving data from 448 wells. This final dataset was then subjected to cross validation and hyperparameter tuning. Several algorithms were compared utilizing the performance metrics such as root mean squared error (RMSE), average absolute percentage error (AAPE), mean absolute percentage error (MAPE), and R-squared (R2). The XGBoost model performed the best with the AAPE and MAPE of 1.9%, MSE of 0.04, and R2 of 0.996. As a result, an application based on the XGBoost model was developed to improve the gas rate accuracy during flowback operations. Feature importance was also studied and showed that total condensate volume and condensate gas ratio were the most important predictors. In comparison with the traditional and modified Gilbert choke equation, with an average error of ~5 to 8%, and RMSE of ~14, the error with this ML approach is a few folds lower because the data approach captures and handles the variance better. In addition, the choke equations yield much higher errors in eccentric conditions of low gas rates and high water condensate ratios. The error comparison with a few different analytical models is presented here and shows that the XGBoost performed better than the existing models. To simplify the computational time and resources, an attempt to map the XGBoost output into a simple linear equation is also presented, along with its accuracy.
Data-based approaches provide a high level of operational and cost relief in the surface well testing domain, but this space has not been well explored. This work shows the value and benefits of the ML models in well testing domain.