Advanced Machine Learning Methods for Production Data Pattern Recognition

Author:

Subrahmanya Niranjan1,Xu Peng2,El-Bakry Amr3,Reynolds Carmon4

Affiliation:

1. ExxonMobil Research and Engineering Company

2. ExxonMobil Upstream Research Company

3. ExxonMobil Production Company

4. ExxonMobil Information Technology

Abstract

Abstract An important challenge for asset management is to analyze large amounts of data in a short period of time to provide insightful information for decision making in a timely fashion. Analyzing all available data manually is impractical and inefficient. It is advantageous to develop pattern recognition algorithms to recognize events-of-interest to achieve effective asset management. Conventional pattern recognition algorithms usually require a fairly large training set in which data points are carefully prepared and "labeled". Examples include designating an equipment’s status as healthy or faulty by subject-matter experts. This can make the process time consuming and error-prone when the training set is large. For most applications, a small amount of data points are already labeled by the experts through their routine activities. While these data points are usually not enough to form a training set for conventional pattern recognition methods, some of the newer methods can take advantage of them along with the hidden manifold structures manifested by the unlabeled data. Moreover, subject matter experts may be willing to provide more input if a clear indication of the value of information and a manageable subset of data is pre-selected for their inspection. In fact, many other industries are facing the same challenge where the cost of acquiring labels is too expensive to be practical and large amounts of unlabeled data and limited expert time are available. A suite of advanced machine learning algorithms (e.g., semi-supervised learning, active learning) have been developed to tackle this challenge, and many of them have been successfully used for various applications in the past few years. In this paper, we will review the concepts and report our observations about the effectiveness of these methods in a real-world asset management scenario. We consider well test validation in an asset with a large number of tests as an example of a label-rich data set that can serve as the basis for our numerical review of existing methods. In this example we will specifically look at the task of building a statistical model to recognize the validity of rate measurement tests in a test separator. In this case, through their daily activities, the operators have labeled most of these tests as valid or invalid. The extensive amount of well test validation data provides sufficient information to assess the newer approaches under review. The plan then is to apply a similar approach to tasks such as equipment health monitoring to identify pump failures with limited expert input. Exxon Mobil Corporation has numerous subsidiaries, many with names that include ExxonMobil, Exxon, Esso and Mobil. For convenience and simplicity in this paper, the parent company and its subsidiaries may be referenced separately or collectively as "ExxonMobil." Abbreviated references describing global or regional operational organizations and global or regional business lines are also sometimes used for convenience and simplicity. Nothing in this paper is intended to override the corporate separateness of these separate legal entities. Working relationships discussed in this paper do not necessarily represent a reporting connection, but may reflect a functional guidance, stewardship, or service relationship. Conceptually, reduction in labeled input can be achieved by combining the information from the labels and the statistical distribution of the data (e.g., clusters). As an extreme example, consider that the pump measurement data may show two distinct clusters and the operators have labeled a few data points in one cluster as pump failures when reports had to be made due to wells being shut in. This information is sufficient to label one of the clusters as healthy and the other one as faulty. For a new measurement, a prediction may be made by first determining the cluster to which the measurement belongs and then assigning it the corresponding label. While most real world problems are much more challenging than this example due to the number of data points, dimensionality of the data, lack of clear cluster structure and potential ambiguity of data structures, similar ideas can be used to develop highly accurate statistical models with a limited number of labels.

Publisher

SPE

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3