Affiliation:
1. School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
Abstract
Visual saliency models imitate the attentive mechanism of the human visual system (HVS) to detect the objects that stand out from their neighbors in the scene. Some biological phenomena in HVS, such as contextual cueing effects, suggest that the contextual information of the whole scene does guide the attentive mechanism. The saliency value of each image patch is influenced by its visual (local) features as well as the contextual information of the whole scene. Modern saliency models are based on deep convolutional neural networks. Because the convolutional operators operate locally and use weight sharing, such networks inherently have difficulty capturing global and location-dependent features. In addition, these models calculate the saliency value pixel-wise using local features. Therefore, it is necessary to provide global features along with local features. In this regard, we propose two approaches for capturing the contextual information from the scene. In our first method, we introduce a shift-variant fully connected component to capture global and location-dependent information. Instead of using the native CNN of our base model, in our second method, we use a VGGNet to capture the global and context information of the scene. To show the effectiveness of our methods, we use them to extend the SAM-ResNet saliency model. To evaluate our proposed approaches, four challenging saliency benchmark datasets were used. The experimental results showed that our methods could outperform the existing state-of-the-art saliency prediction models.
Subject
Computer Networks and Communications,Computer Science Applications
Reference71 articles.
1. EML-NET: An Expandable Multi-Layer NETwork for saliency prediction
2. Saliency prediction in the deep learning era: successes and limitations;A. Borji;IEEE Transactions on Pattern Analysis and Machine Intelligence,2019
3. Revisiting Video Saliency Prediction in the Deep Learning Era
4. How do drivers allocate their potential attention?;T. Deng;Driving fixation prediction via convolutional neural networks,2019
5. A model of saliency-based visual attention for rapid scene analysis