Affiliation:
1. Department of EECE, GITAM School of Technology, GITAM Deemed to be University, Rushikonda, Visakhapatnam 530045, India
Abstract
The paper demonstrates a novel methodology for Content-Based Image Retrieval (CBIR), which shifts the focus from conventional domain-specific image queries to more complex text-based query processing. Latent diffusion models are employed to interpret complex textual prompts and address the requirements of effectively interpreting the complex textual query. Latent Diffusion models successfully transform complex textual queries into visually engaging representations, establishing a seamless connection between textual descriptions and visual content. Custom triplet network design is at the heart of our retrieval method. When trained well, a triplet network will represent the generated query image and the different images in the database. The cosine similarity metric is used to assess the similarity between the feature representations in order to find and retrieve the relevant images. Our experiments results show that latent diffusion models can successfully bridge the gap between complex textual prompts for image retrieval without relying on labels or metadata that are attached to database images. This advancement sets the stage for future explorations in image retrieval, leveraging the generative AI capabilities to cater to the ever-evolving demands of big data and complex query interpretations.
Reference53 articles.
1. Performance Evaluation in Content-Based Image Retrieval: Overview and Proposals;Squire;Pattern Recognit. Lett.,2001
2. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022, January 18–24). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
3. Art and the Science of Generative AI;Fjeld;Science,2023
4. Deep Metric Learning Using Triplet Network;Hoffer;Proceedings of the Third International Workshop on Similarity-Based Pattern Recognition, SIMBAD 2015,2015
5. Hu, R., Barnard, M., and Collomosse, J. (2010, January 26–29). Gradient Field Descriptor for Sketch Based Retrieval and Localization. Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China.