Dense Semantic Forecasting with Multi-Level Feature Warping
-
Published:2022-12-28
Issue:1
Volume:13
Page:400
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Sović Iva, Šarić Josip, Šegvić SinišaORCID
Abstract
Anticipation of per-pixel semantics in a future unobserved frame is also known as dense semantic forecasting. State-of-the-art methods are based on single-level regression of a subsampled abstract representation of a recognition model. However, single-level regression cannot account for skip connections from the backbone to the upsampling path. We propose to address this shortcoming by warping shallow features from observed images with upsampled feature flow. Our goal is not straightforward, since warping with coarse feature flow introduces noise into the forecasted features. We therefore base our work on single-frame models that are more resistant to the noise in skip connections. To achieve this, we propose a training procedure that enables recognition models to operate reasonably well with or without skip connections. Validation experiments reveal interesting insights into the influence of particular skip connections on recognition accuracy. Our forecasting method delivers 70.2% mIoU 0.18 s into the future and 58.5% mIoU 0.54 s into the future. These experiments show 0.6 mIoU points of improved accuracy with respect to the baseline and reveal promising directions for future work.
Funder
Rimac Technologies, Croatian Science Foundation European Regional Development Fund
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference37 articles.
1. Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., and Savarese, S. (2016, January 27–30). Social LSTM: Human Trajectory Prediction in Crowded Spaces. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. 2. Vondrick, C., Pirsiavash, H., and Torralba, A. (2016, January 27–30). Anticipating Visual Representations from Unlabeled Video. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA. 3. Yao, Y., Xu, M., Choi, C., Crandall, D.J., Atkins, E.M., and Dariush, B. (2019, January 20–24). Egocentric vision-based future vehicle localization for intelligent driving assistance systems. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada. 4. Hu, A., Cotter, F., Mohan, N., Gurau, C., and Kendall, A. (2020, January 23–28). Probabilistic Future Prediction for Video Scene Understanding. Proceedings of the European Conference on Computer Vision, Glasgow, UK. 5. Luc, P., Neverova, N., Couprie, C., Verbeek, J., and LeCun, Y. (2017, January 22–29). Predicting deeper into the future of semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.
|
|