Affiliation:
1. School of Fundamental and Applied Sciences, Assam Don Bosco University, Guwahati 782402, India
2. College of Computer Science and IT, Albaha University, Albaha 65799, Saudi Arabia
Abstract
Anomaly detection in real-time data is accepted as a vital area of research. Clustering techniques have effectively been applied for the detection of anomalies several times. As the datasets are real time, the time of data generation is important. Most of the existing clustering-based methods either follow a partitioning or a hierarchical approach without addressing time attributes of the dataset distinctly. In this article, a mixed clustering approach is introduced for this purpose, which also takes time attributes into consideration. It is a two-phase method that first follows a partitioning approach, then an agglomerative hierarchical approach. The dataset can have mixed attributes. In phase one, a unified metric is used that is defined based on mixed attributes. The same metric is also used for merging similar clusters in phase two. Tracking of the time stamp associated with each data instance is conducted simultaneously, producing clusters with different lifetimes in phase one. Then, in phase two, the similar clusters are merged along with their lifetimes. While merging the similar clusters, the lifetimes of the corresponding clusters with overlapping cores are merged using superimposition operation, producing a fuzzy time interval. This way, each cluster will have an associated fuzzy lifetime. The data instances either belonging to sparse clusters, not belonging to any of the clusters or falling in the fuzzy lifetimes with low membership values can be treated as anomalies. The efficacy of the algorithms can be established using both complexity analysis as well as experimental studies. The experimental results with a real world dataset and a synthetic dataset show that the proposed algorithm can detect the anomalies with 90% and 98% accuracy, respectively.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference45 articles.
1. Pamula, R., Deka, J.K., and Nandi, S. (2011, January 19–20). An Outlier Detection Method based on Clustering. Proceedings of the 2011 Second International Conference on Emerging Applications of Information Technology, Kolkata, India.
2. Survey on Anomaly Detection on Data Mining Techniques;Agrawal;Procedia Comput. Sci.,2015
3. Zaki, M.J., and Wong, L. (2003). Data Mining Techniques, Computer Science. Available online: http://www.cs.rpi.edu/~zaki/PaperDir/PGKD04.pdf.
4. Soni, D. (2022, March 15). Understanding the Different Types of Mmachine Learning. Towards Data Science, 2019. Available online: https://towardsdatascience.com/understanding-the-different-types-of-machine-learning-models-9c47350bb68a.
5. Hartigan, J.A. (1975). Hartigan Clustering Algorithms, John Wiley & Sons.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献