HOME: 3D Human–Object Mesh Topology-Enhanced Interaction Recognition in Images-Reference-Cited by-同舟云学术

HOME: 3D Human–Object Mesh Topology-Enhanced Interaction Recognition in Images

Published:2022-08-10 Issue:16 Volume:10 Page:2841
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Peng Weilong,Li Cong,Tang Keke,Liu Xianyong,Fang Meie^ORCID

Abstract

Human–object interaction (HOI) recognition is a very challenging task due to the ambiguity brought by occlusions, viewpoints, and poses. Because of the limited interaction information in the image domain, extracting 3D features of a point cloud has been an important means to improve the recognition performance of HOI. However, the features neglect topological features of adjacent points at low level, and the deep topology relation between a human and an object at high level. In this paper, we present a 3D human–object mesh topology enhanced method (HOME) for HOI recognition in images. In the method, human–object mesh (HOM) is built by integrating the reconstructed human and object mesh from images firstly. Therefore, under the assumption that the interaction comes from the macroscopic pattern constructed by spatial position and microscopic topology of human–object, HOM is inputted into MeshCNN to extract the effective edge features by edge-based convolution from bottom to up, as the topological features that encode the invariance of the interaction relationship. At last, topological cues are fused with visual cues to enhance the recognition performance greatly. In the experiment, HOI recognition results have achieved an improvement of about 4.3% mean average precision (mAP) in the Rare cases of the HICO-DET dataset, which verifies the effectiveness of the proposed method.

Funder

National Natural Science Foundation of China

Guangdong Basic and Applied Basic Research Foundation

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/10/16/2841/pdf

Reference41 articles.

1. End-to-end concept word detection for video captioning, retrieval, and question answering;Yu;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017

2. A joint sequence fusion model for video question answering and retrieval;Yu;Proceedings of the European Conference on Computer Vision (ECCV),2018

3. Mdmmt: Multidomain multimodal transformer for video retrieval;Dzabraev;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021

4. Quo vadis, action recognition? A new model and the kinetics dataset;Carreira;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017

5. T-C3D: Temporal convolutional 3D network for real-time action recognition;Liu;Proceedings of the AAAI Conference on Artificial Intelligence,2018