Abstract
AbstractTemporal inconsistency is the annoying artifact that has been commonly introduced in low-light video enhancement, but current methods tend to overlook the significance of utilizing both data-centric clues and model-centric design to tackle this problem. In this context, our work makes a comprehensive exploration from the following three aspects. First, to enrich the scene diversity and motion flexibility, we construct a synthetic diverse low/normal-light paired video dataset with a carefully designed low-light simulation strategy, which can effectively complement existing real captured datasets. Second, for better temporal dependency utilization, we develop a Temporally Consistent Enhancer Network (TCE-Net) that consists of stacked 3D convolutions and 2D convolutions to exploit spatial-temporal clues in videos. Last, the temporal dynamic feature dependencies are exploited to obtain consistency constraints for different frame indexes. All these efforts are powered by a Spatial-Temporal Compatible Learning (STCL) optimization technique, which dynamically constructs specific training loss functions adaptively on different datasets. As such, multiple-frame information can be effectively utilized and different levels of information from the network can be feasibly integrated, thus expanding the synergies on different kinds of data and offering visually better results in terms of illumination distribution, color consistency, texture details, and temporal coherence. Extensive experimental results on various real-world low-light video datasets clearly demonstrate the proposed method achieves superior performance to state-of-the-art methods. Our code and synthesized low-light video database will be publicly available at https://github.com/lingyzhu0101/low-light-video-enhancement.git.
Funder
City University of Hong Kong
Publisher
Springer Science and Business Media LLC