Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval-Reference-Cited by-同舟云学术

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

Published:2024-05-23 Issue:11 Volume:13 Page:1628
ISSN:2304-8158
Container-title:Foods
language:en
Short-container-title:Foods

Author:

Zou Zhuoyang¹^ORCID,Zhu Xinghui¹^ORCID,Zhu Qinying¹,Zhang Hongyan¹,Zhu Lei¹^ORCID

Affiliation:

1. College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China

Abstract

As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Hunan Province

Scientific Research Project of Hunan Provincial Department of Education

Publisher

MDPI AG

Link

https://www.mdpi.com/2304-8158/13/11/1628/pdf

Reference54 articles.

1. Guo, Z., and Jayan, H. (2023). Fast Nondestructive Detection Technology and Equipment for Food Quality and Safety. Foods, 12.

2. Recent developments and applications of surface enhanced Raman scattering spectroscopy in safety detection of fruits and vegetables;Guo;Food Chem.,2023

3. Thames, Q., Karpur, A., Norris, W., Xia, F., Panait, L., Weyand, T., and Sim, J. (2021, January 20–25). Nutrition5k: Towards automatic nutritional understanding of generic food. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.

4. Large scale visual food recognition;Min;IEEE Trans. Pattern Anal. Mach. Intell.,2023

5. Vision-based fruit recognition via multi-scale attention CNN;Min;Comput. Electron. Agric.,2023