Affiliation:
1. School of Mechatronic Engineering and Automation, Shanghai University, 200444 Shanghai, P. R. China
Abstract
Detecting the interaction between humans and objects in images is a critical problem for obtaining a deeper understanding of the visual relationship in a scene and also a critical technology in many practical applications, such as augmented reality, video surveillance and information retrieval. Be that as it may, due to the fine-grained actions and objects in the real scene and the coexistence of multiple interactions in one scene, the problem is far from being solved. This paper differs from prior approaches, which focused only on the features of instances, by proposing a method that utilizes a four-stream CNNs network for human-object interaction (HOI) detection. More detailed visual features, spatial features and pose features from human-object pairs are extracted to solve the challenging task of detection in images. Specially, the core idea is that the region where people interact with objects contains important identifying cues for specific action classes, and the detailed cues can be fused to facilitate HOI recognition. Experiments on two large-scale HOI public benchmarks, V-COCO and HICO-DET, are carried out and the results show the effectiveness of the proposed method.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献