Towards Benchmarking for Evaluating Machine Learning Methods in Detecting Outliers in Process Datasets

Author:

Schindler Thimo F.1ORCID,Schlicht Simon2,Thoben Klaus-Dieter12ORCID

Affiliation:

1. BIBA-Bremer Institut für Produktion und Logistik GmbH, Hochschulring 20, D-28359 Bremen, Germany

2. Department of Integrated Product Development, University of Bremen, Bibliothekstraße 1, D-28359 Bremen, Germany

Abstract

Within the integration and development of data-driven process models, the underlying process is digitally mapped in a model through sensory data acquisition and subsequent modelling. In this process, challenges of different types and degrees of severity arise in each modelling step, according to the Cross-Industry Standard Process for Data Mining (CRISP-DM). Particularly in the context of data acquisition and integration into the process model, it can be assumed with a sufficiently high degree of probability that the acquired data contain anomalies of various kinds. The outliers must be detected in the data preparation and processing phase and dealt with accordingly. If this is sufficiently implemented, it will positively impact the subsequent modelling in terms of accuracy and precision. Therefore, this paper shows how outliers can be identified using the unsupervised machine learning methods autoencoder, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest (iForest), and One-Class Support Vector Machine (OCSVM). Following implementing these methods, we compared them by applying the Numenta Anomaly Benchmark (NAB) and sufficiently presented the individual strengths and disadvantages. Evaluating the correctness, distinctiveness and robustness criteria described in the paper showed that the One-Class Support Vector Machine was outstanding among the methods considered. This is because the OCSVM achieved acceptable anomaly detections on the available process datasets with comparatively little effort.

Funder

German Federal Ministry for Digital and Transport (BMDV) in the ”Innovative Port Technologies” (IHATEC II) program

Publisher

MDPI AG

Subject

Computer Networks and Communications,Human-Computer Interaction

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3