1. Arevalo, J., Solorio, T., Montes-y-Gómez, M., & González, F. A. (2017). Gated multimodal units for information fusion. Proceedings of the 5th international conference on learning Representations (ICLR, Workshop), Toulon, France.
2. Cross-Modal Retrieval in the Cooking Context
3. Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
4. Multimodal Encoders for Food-Oriented Cross-Modal Retrieval
5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the 9th international conference on learning representations (ICLR), online.