Abstract
Reactive and biased human decision-making during well construction operations can result in problems ranging from minor inefficiencies to events that can have far-reaching negative consequences for safety, environmental compliance and cost. A system that can automatically generate an optimal action sequence from any given state to meet an operation’s objectives is therefore highly desirable. Moreover, an intelligent agent capable of self-learning can offset the computation and memory costs associated with evaluating the action space, which is often vast. This paper details the development of such action planning systems by utilizing reinforcement learning techniques. The concept of self-play used by game AI engines (such as AlphaGo and AlphaZero in Google’s DeepMind group) is adapted here for well construction tasks, wherein a drilling agent learns and improves from scenario simulations performed using digital twins. The first step in building such a system necessitates formulating the given well construction task as a Markov Decision Process (MDP). Planning is then accomplished using Monte Carlo tree search (MCTS), a simulation-based search technique. Simulations, based on the MCTS’s tree and rollout policies, are performed in an episodic manner using a digital twin of the underlying task(s). The results of these episodic simulations are then used for policy improvement. Domain-specific heuristics are included for further policy enhancement, considered factors such as trade-offs between safety and performance, the distance to the goal state, and the feasibility of taking specific actions from specific states. We demonstrate our proposed action planning system for hole cleaning, a task which to date has proven difficult to optimize and automate. Comparing the action sequences generated by our system to real field data, it is shown that it would have resulted in significantly improved hole cleaning performance compared to the action taken in the field, as quantified by the final state reached and the long-term reward. Such intelligent sequential decision-making systems, which use heuristics and exploration–exploitation trade-offs for optimum results, are novel applications in well construction and may pave the way for the automation of tasks that until now have been exclusively controlled by humans.
Subject
Energy (miscellaneous),Energy Engineering and Power Technology,Renewable Energy, Sustainability and the Environment,Electrical and Electronic Engineering,Control and Optimization,Engineering (miscellaneous),Building and Construction
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献