Affiliation:
1. School of ECE, Peking University, Shenzhen, China
2. Peng Cheng Laboratory, Shenzhen, China
Abstract
Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring triplets of < human, verb, object >. However, recent HOI detection methods mostly rely on additional annotations (e.g., human pose) and neglect powerful interactive reasoning beyond convolutions. In this paper, we present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs, in which interactive semantics implied among visual targets are efficiently exploited. The proposed model consists of a project function that maps related targets from convolution space to a graph-based semantic space, a message passing process propagating semantics among all nodes and an update function transforming the reasoned nodes back to convolution space. Furthermore, we construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet. Beyond inferring HOIs using instance features respectively, the framework dynamically parses pairwise interactive semantics among visual targets by integrating two-level in-Graphs, i.e., scene-wide and instance-wide in-Graphs. Our framework is end-to-end trainable and free from costly annotations like human pose. Extensive experiments show that our proposed framework outperforms existing HOI detection methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 9.4% and 15% relatively, validating its efficacy in detecting HOIs.
Publisher
International Joint Conferences on Artificial Intelligence Organization
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献