Affiliation:
1. Kyoto University, Japan
Abstract
Visual tracking of humans or objects in motion is a challenging problem when observed data undergo appearance changes (e.g., due to illumination variations, occlusion, cluttered background, etc.). Moreover, tracking systems are usually initialized with predefined target templates, or trained beforehand using known datasets. Hence, they are not always efficient to detect and track objects whose appearance changes over time. In this paper, we propose a multimodal framework based on particle filtering for visual tracking of objects under challenging conditions (e.g., tracking various human body parts from multiple views). Particularly, the authors integrate various cues such as color, motion and depth in a global formulation. The Earth Mover distance is used to compare color models in a global fashion, and constraints on motion flow features prevent common drifting effects due to error propagation. In addition, the model features an online mechanism that adaptively updates a subspace of multimodal templates to cope with appearance changes. Furthermore, the proposed model is integrated in a practical detection and tracking process, and multiple instances can run in real-time. Experimental results are obtained on challenging real-world videos with poorly textured models and arbitrary non-linear motions.