Abstract
Few-shot learning for visual recognition aims to adapt to novel unseen classes with only a few images. Recent work, especially the work based on low-level information, has achieved great progress. In these work, local representations (LRs) are typically employed, because LRs are more consistent among the seen and unseen classes. However, most of them are limited to an individual image-to-image or image-to-class measure manner, which cannot fully exploit the capabilities of LRs, especially in the context of a certain task. This paper proposes an Adaptive Task-aware Local Representations Network (ATL-Net) to address this limitation by introducing episodic attention, which can adaptively select the important local patches among the entire task, as the process of human recognition. We achieve much superior results on multiple benchmarks. On the miniImagenet, ATL-Net gains 0.93% and 0.88% improvements over the compared methods under the 5-way 1-shot and 5-shot settings. Moreover, ATL-Net can naturally tackle the problem that how to adaptively identify and weight the importance of different key local parts, which is the major concern of fine-grained recognition. Specifically, on the fine-grained dataset Stanford Dogs, ATL-Net outperforms the second best method with 5.39% and 9.69% gains under the 5-way 1-shot and 5-shot settings.
Publisher
International Joint Conferences on Artificial Intelligence Organization
Cited by
41 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献