Author:
Singha Tanmay,Pham Duc-Son,Krishna Aneesh
Abstract
Urban street scene analysis is an important problem in computer vision with many off-line models achieving outstanding semantic segmentation results. However, it is an ongoing challenge for the research community to develop and optimize the deep neural architecture with real-time low computing requirements whilst maintaining good performance. Balancing between model complexity and performance has been a major hurdle with many models dropping too much accuracy for a slight reduction in model size and unable to handle high-resolution input images. The study aims to address this issue with a novel model, named M2FANet, that provides a much better balance between model’s efficiency and accuracy for scene segmentation than other alternatives. The proposed optimised backbone helps to increase model’s efficiency whereas, suggested Multi-level Multi-path (M2) feature aggregation approach enhances model’s performance in the real-time environment. By exploiting multi-feature scaling technique, M2FANet produces state-of-the-art results in resource-constrained situations by handling full input resolution. On the Cityscapes benchmark data set, the proposed model produces 68.5% and 68.3% class accuracy on validation and test sets respectively, whilst having only 1.3 million parameters. Compared with all real-time models of less than 5 million parameters, the proposed model is the most competitive in both performance and real-time capability.
Reference43 articles.
1. Segnet: A deep convolutional encoder-decoder architecture for image segmentation;Badrinarayanan;IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),2017
2. Semantic object classes in video: A high-definition ground truth database;Brostow;Pattern Recognition Letters,2009
3. Robustrepstream: Robust stream clustering using self-controlled connectivity graph;Callister;Intelligent Data Analysis,2020
4. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs;Chen;IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),2017
5. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff and H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European Conference on Computer Vision (ECCV), pages 801–818, Munich, Germany, September 2018. Springer International Publishing.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SDBNet: Lightweight Real-time Semantic Segmentation Using Short-term Dense Bottleneck;2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA);2022-11-30