Author:
Zhao Aojie,Liu Yifan,Cheng Kun,Ma Aiping,Yu Jianguo
Abstract
Abstract
Indoor robot localization is a challenging problem in computer vision due to sensor obstacles in a crowded environment. Pure vision localization is increasingly popular since it does not require sensors other than low-cost cameras. We adopt a top-view camera setup, effectively avoiding the problem of positioning failure due to potential occlusion of front-view cameras. We use the distilling of a pre-trained large-scale vision language CLIP model to improve the performance degradation caused by the small data set size. Our solution achieved promising performance in our customized classification-based localization test data.