Author:
Vu Hung,Nguyen Tu Dinh,Le Trung,Luo Wei,Phung Dinh
Abstract
Detecting anomalies in surveillance videos has long been an important but unsolved problem. In particular, many existing solutions are overly sensitive to (often ephemeral) visual artifacts in the raw video data, resulting in false positives and fragmented detection regions. To overcome such sensitivity and to capture true anomalies with semantic significance, one natural idea is to seek validation from abstract representations of the videos. This paper introduces a framework of robust anomaly detection using multilevel representations of both intensity and motion data. The framework consists of three main components: 1) representation learning using Denoising Autoencoders, 2) level-wise representation generation using Conditional Generative Adversarial Networks, and 3) consolidating anomalous regions detected at each representation level. Our proposed multilevel detector shows a significant improvement in pixel-level Equal Error Rate, namely 11.35%, 12.32% and 4.31% improvement in UCSD Ped 1, UCSD Ped 2 and Avenue datasets respectively. In addition, the model allowed us to detect mislabeled anomalies in the UCDS Ped 1.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
49 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献