Emotional effect is highly subjective in people's cognitive process, and a single discrete emotional feeling can hardly support the description of the immersion scene, which also puts forward higher requirements for emotional calculation in photography. Therefore, this article first constructs a photographic scene recognition model, and then establishes a visual emotion analysis model which optimizes the basic structure of vgg19 through CNN, extracts the user's photography situation information from the corresponding image metadata, establishes the mapping relationship between situation and emotion, and obtains the low-dimensional dense vector representation of the situation features through embedding. The authors divided eight emotional categories; accuracy of the model is compared and the feature distribution of scene-emotion in different works is analyzed. The results show that the accuracy of the scene-emotion recognition model of photographic works after multimodal fusion is high, reaching 73.9%, in addition, different shooting scenes can distinguish the emotional characteristics of works.