Affiliation:
1. Sinopec Research Institute of Petroleum Engineering
2. The University of Texas at Austin
Abstract
Unconventional hydrocarbon resources, including shale hydrocarbon, tight hydrocarbon and coalbed methane, have become an increasingly essential part of global oil and gas supply during the past decades. Tight Oil and Gas projects, especially in Canada, exhibit a number of unique features, such as large onshore geographical area, variable but relative low productivity, intensive drilling and formation stimulation programs and complicated operational process. These features differentiate such unconventional formations from conventional formations, thus the traditional knowledge and methodology cannot be simply applied to unconventional resource plays. To better develop a tight hydrocarbon formation with lower capital, expense and lifting cost, in this paper a novel model based on data mining for tight oil production prediction and Geography / Petrophysics / Engineering parameters optimization has been introduced. It establishes a correlation between Estimated Ultimate Recovery (EUR) and key independent parameters (geography / petrophysics / engineering) by machine learning analysis.
In this study, all the data with more than 50 variables over Canada Cardium tight oil formation, including production, well logging, well testing, seismic, lab experiments and other tests, have been collected and used for the analysis. Firstly, the multi-sets of cumulative production data and relavent Geography / Petrophysics / Engineering (GPE) parameters are collected. Then a sensitivity test is carried out to determine the most important GPE parameters and thus a spatial database is set up. Based on the sensitive test and data mining results, multiple key parameters have been recognized and used as independent variables for the machine learning analysis. Among all the machine-learning algorithms, the Random Forest is applied to evaluate the relationship between EUR and multi independent variables. In order to improve the model accuracy, a supervised learning algorithm is applied to train the model.
Based on the sensitivity analysis results, the following matrices, well location (WL), Resource Density (RD), True Vertical Depth (TVD), Stimulated Length (SL), Total Stage Count (TSC), Pumped Proppant Per Length (PPL), Pumped Fluid Per Length (PFL) Sand Concentration (SC) and Injection Rate (IR), are recognized as the most important and sensitive independent variables for production prediction in Cardium tight oil formation. The models are established based on different machine learning algorithms, and the prediction results are compared and discussed in detail. The accuracy of prediction by Random Forest could reach as high as 90%, which is much higher than predictions by other machine learning algorithms. Therefore, the predictive model based on Random Forest is used as a feasible tool for economic evaluation by Sinopec.
The data, methodology, models and predictions demonstrated in this paper can potentially bring great and novel value to the industry. This study offers an insight on the tight hydrocarbon production mechanism from a big data mining perspective, as well as a feasible and accurate method to predict production and evaluate project economic feasibility in Cardium formation. In addition, different machine learning algorithms, based on geography, petrophysics, and engineering data with more than 50 variables in Cardium formation, are summarized and compared for the first time. The methodology discussed in this paper can be easily applied to other unconventional fields and formations such as Montney, Eagle Ford, Fuling and Bakken Shale plays to predict production accurately in the case that the data are available.