Abstract
AbstractIn many fields, e.g., data mining and machine learning, distance-based outlier detection (DOD) is widely employed to remove noises and find abnormal phenomena, because DOD is unsupervised, can be employed in any metric spaces, and does not have any assumptions of data distributions. Nowadays, data mining and machine learning applications face the challenge of dealing with large datasets, which requires efficient DOD algorithms. We address the DOD problem with two different definitions. Our new idea, which solves the problems, is to exploit an in-memory proximity graph. For each problem, we propose a new algorithm that exploits a proximity graph and analyze an appropriate type of proximity graph for the algorithm. Our empirical study using real datasets confirms that our DOD algorithms are significantly faster than state-of-the-art ones.
Publisher
Springer Science and Business Media LLC
Subject
Hardware and Architecture,Information Systems
Reference56 articles.
1. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html
2. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html
3. http://corpus-texmex.irisa.fr/
4. https://github.com/dwyl/english-words
5. Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263 (2015)
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Outlier detection using iterative adaptive mini-minimum spanning tree generation with applications on medical data;Frontiers in Physiology;2023-10-13
2. Efficient Density-peaks Clustering Algorithms on Static and Dynamic Data in Euclidean Space;ACM Transactions on Knowledge Discovery from Data;2023-08-10
3. Fast Algorithm for Embedded Order Dependency Validation;35th International Conference on Scientific and Statistical Database Management;2023-07-10
4. Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data;2022 IEEE International Conference on Big Data (Big Data);2022-12-17
5. Learned k-NN distance estimation;Proceedings of the 30th International Conference on Advances in Geographic Information Systems;2022-11