On data efficiency of univariate time series anomaly detection models-Reference-Cited by-同舟云学术

On data efficiency of univariate time series anomaly detection models

Published:2024-06-11 Issue:1 Volume:11 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Sun Wu,Li Hui,Liang Qingqing,Zou Xiaofeng,Chen Mei,Wang Yanhao

Abstract

AbstractIn machine learning (ML) problems, it is widely believed that more training samples lead to improved predictive accuracy but incur higher computational costs. Consequently, achieving better data efficiency, that is, the trade-off between the size of the training set and the accuracy of the output model, becomes a key problem in ML applications. In this research, we systematically investigate the data efficiency of Univariate Time Series Anomaly Detection (UTS-AD) models. We first experimentally examine the performance of nine popular UTS-AD algorithms as a function of the training sample size on several benchmark datasets. Our findings confirm that most algorithms become more accurate when more training samples are used, whereas the marginal gain for adding more samples gradually decreases. Based on the above observations, we propose a novel framework called FastUTS-AD that achieves improved data efficiency and reduced computational overhead compared to existing UTS-AD models with little loss of accuracy. Specifically, FastUTS-AD is compatible with different UTS-AD models, utilizing a sampling- and scaling law-based heuristic method to automatically determine the number of training samples a UTS-AD model needs to achieve predictive performance close to that when all samples in the training set are used. Comprehensive experimental results show that, for the nine popular UTS-AD algorithms tested, FastUTS-AD reduces the number of training samples and the training time by 91.09–91.49% and 93.49–93.82% on average without significant decreases in accuracy.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s40537-024-00940-7.pdf

Reference65 articles.

1. Agarwal PK, Har-Peled S, Varadarajan KR. Geometric approximation via coresets. Comb Comput Geom. 2005;52(1):1–30.

2. Akyildiz IF, Su W, Sankarasubramaniam Y, et al. A survey on sensor networks. IEEE Commun Mag. 2002;40(8):102–14. https://doi.org/10.1109/MCOM.2002.1024422.

3. Al-Shedivat M, Li L, Xing EP, et al (2021) On data efficiency of meta-learning. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, pp 1369–1377, http://proceedings.mlr.press/v130/al-shedivat21a.html

4. Amihud Y. Illiquidity and stock returns: cross-section and time-series effects. J Financial Markets. 2002;5(1):31–56. https://doi.org/10.1016/S1386-4181(01)00024-6.

5. An J, Cho S. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture IE. 2015;2(1):1–18.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Adaptive Toeplitz Convolution- enhanced Classifier for Anomaly Detection in ECG Big Data;2024-07-30