Affiliation:
1. Guangdong Key Laboratory of Big Data Intelligence for Vocational Education Shenzhen Polytechnic University Shenzhen China
2. College of Digital Creativity and Animation, Shenzhen Polytechnic University Shenzhen China
Abstract
AbstractThis paper presents a novel approach to face retrieval that leverages the capabilities of large language models and visual base models, marking a significant departure from traditional IoT text retrieval methods that depend on extensive data collection and model training. By eliminating the need for text‐image pair data collection and model training, our method not only dramatically reduces the data and computational costs associated with IoT applications but also achieves high accuracy in face retrieval, as demonstrated by a 72% top‐1 accuracy and 93% top‐3 accuracy on the Celeb‐A dataset. This substantial improvement in efficiency and performance has profound implications for the future of IoT systems, potentially revolutionizing face recognition technology by enabling more scalable, cost‐effective, and accurate solutions. The successful application of zero‐sample face retrieval illustrates the transformative impact that advanced AI models can have on real‐world applications and opens new avenues for research and development in the realm of intelligent IoT devices.
Reference36 articles.
1. ZhouW LiH TianQ.Recent advance in content‐based image retrieval: A literature survey. arXiv preprint arXiv:1706.060642017.
2. Dual Attention Networks for Multimodal Reasoning and Matching
3. AlomMZ TahaTM YakopcicC et al.The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.011642018.