Multimodal Fashion Knowledge Extraction as Captioning-Reference-Cited by-同舟云学术

Multimodal Fashion Knowledge Extraction as Captioning

Published:2023-11-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
language:
Short-container-title:

Author:

Yuan Yifei¹^ORCID,Zhang Wenxuan²^ORCID,Deng Yang³^ORCID,Lam Wai⁴^ORCID

Affiliation:

1. System Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong

2. Alibaba DAMO Academy, Singapore

3. National University of Singapore, Singapore

4. The Chinese University of Hong Kong, Hong Kong

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3624918.3625315

Reference50 articles.

1. VQA: Visual Question Answering

2. Satanjeev Banerjee and Alon Lavie . 2005 . METEOR: An automatic metric for MT evaluation with improved correlation with human judgments . In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72 . Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.

3. Tom B Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 ( 2020 ). Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).

4. Describing Clothing by Semantic Attributes

5. Jaemin Cho Jie Lei Hao Tan and Mohit Bansal. 2021. Unifying Vision-and-Language Tasks via Text Generation. In ICML. Jaemin Cho Jie Lei Hao Tan and Mohit Bansal. 2021. Unifying Vision-and-Language Tasks via Text Generation. In ICML.