Affiliation:
1. Inter-University Semiconductor Research Center (ISRC)
2. Department of Electrical and Computer Engineering, Seoul National University
Abstract
Multi-camera instance association, which identifies identical objects among multiple objects in multi-view images, is challenging due to several harsh constraints. To tackle this problem, most studies have employed CNNs as feature extractors but often fail under such harsh constraints. Inspired by Vision Transformer (ViT), we first develop a pure ViT-based framework for robust feature extraction through self-attention and residual connection. We then propose two novel methods to achieve robust feature learning. First, we introduce learnable pseudo 3D position embeddings (P3DEs) that represent the 3D location of an object in the world coordinate system, which is independent of the harsh constraints. To generate P3DEs, we encode the camera ID and the object's 2D position in the image using embedding tables. We then build a framework that trains P3DEs to represent an object's 3D position in a weakly supervised manner. Second, we also utilize joint patch generation (JPG). During patch generation, JPG considers an object and its surroundings as a single input patch to reinforce the relationship information between two features. Ultimately, experimental results demonstrate that both ViT-P3DE and ViT-P3DE with JPG achieve state-of-the-art performance and significantly outperform existing works, especially when dealing with extremely harsh constraints.
Publisher
International Joint Conferences on Artificial Intelligence Organization
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Real Post-Training Quantization Framework for Resource-Optimized Multiplier in LLMs;2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS);2024-04-22
2. A Low-Latency and Scalable Vector Engine with Operation Fusion for Transformers;2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS);2024-04-22
3. A Low-Latency and Lightweight FPGA-Based Engine for Softmax and Layer Normalization Acceleration;2023 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia);2023-10-23