Affiliation:
1. Université Libre de Bruxelles
Abstract
Abstract
Outlier detection and cleaning is an essential step in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within a individual trajectories, i.e., points that deviate significantly inside a single trajectory. We benchmark ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This benchmarking considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into seven types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, Ensemble-based methods, Learning-based methods, and Heuristic-based methods. Our research provides insights into these libraries' performance and contributes to developing data preprocessing and outlier detection methodologies.
Publisher
Research Square Platform LLC
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献