Abstract
Semantic segmentation, which is one of the key problems in computer vision, has been applied in various application domains such as autonomous driving, robot navigation, or medical imagery, to name a few. Recently, deep learning, especially deep neural networks, have shown significant performance improvement over conventional semantic segmentation methods. In this paper, we present a novel encoder-decoder type deep neural network-based method, namely XSeNet, that can be trained end-to-end in a supervised manner. We adapt ResNet-50 layers as the encoder and design a cascaded decoder that composes of the stack of the X-Modules, which enables the network to learning dense contextual information and having wider field-of-view. We evaluate our method using CamVid dataset, and experimental results reveal that our method can segment most part of the scene accurately and even outperforms previous state-of-the art methods.
Publisher
Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering
Reference28 articles.
1. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition," Pro-ceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
2. M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks," in European conference oncomputer vision, pp. 818-833, Springer, 2014.
3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifcation with deep convolutional neural networks," inAdvances in neural information processing systems, pp. 1097-1105, 2012.
4. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition," arXiv preprintarXiv:1409.1556, 2014.
5. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.