Author:
Vesnin Dmitry,Levshun Dmitry,Chechulin Andrey
Abstract
CNN-based off-the-shelf features have shown themselves as a good baseline for trademark retrieval. However, in recent years, the computer vision area was transitioning from CNNs to a new architecture – Vision Transformer. In this paper, we investigate the performance of off-the-shelf features extracted with vision transformers and explore the effects of pre, post-processing, and pre-training on big datasets. We propose a method of joint usage of global and local features, which leverages the best aspects of both approaches. Experimental results on METU Trademark Dataset show that off-the-shelf features extracted with ViT-based models outperform off-the-shelf features from CNN-based models. The proposed method achieves the mAP value of 31.23, surpassing previous state-of-the-art results. We assume that the proposed approach for the trademark similarity evaluation will allow one to improve the protection of such data with the help of artificial intelligence methods. Moreover, this approach will allow one to identify cases of unfair use of such data and form an evidence base for litigation.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献