Abstract
With the rapid development of media, the role of video quality assessment (VQA) is becoming increasingly significant. VQA has applications in many domains. For example, in the field of remote medical diagnosis, it can enhance the quality of video communication between doctors and patients. Besides, in sports broadcasting, it can improve video clarity. Within VQA, the human visual system (HVS) is a crucial component that should be taken into consideration. Considering that attention is guided by goal‐driven and top‐down factors, such as anticipated locations or some attractive frames within the video, we propose a blind VQA algorithm based on spatial‐temporal attention model. Specifically, we first use two pretrained convolutional networks to extract low‐level static‐dynamic fusion features. Then, a spatial attention‐guided model is established to get more representative features of frame‐level quality perception. Next, through a temporal attention‐guided model, the video‐level features are obtained. Finally, the features are fed into a regression model to calculate the final video quality score. The experiments conducted on seven VQA databases reach the state‐of‐the‐art performance, demonstrating the effectiveness of our proposed method.
Funder
National Natural Science Foundation of China