Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction-Reference-Cited by-同舟云学术

Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction

Published:2023-11-10 Issue:22 Volume:13 Page:12208
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

He Wentao¹^ORCID,Ma Hanjie¹,Li Shaohua¹^ORCID,Dong Hui²^ORCID,Zhang Haixiang¹^ORCID,Feng Jie¹

Affiliation:

1. School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

2. Hangzhou Codvision Technology Co., Ltd., Hangzhou 311100, China

Abstract

Multimodal Relation Extraction (MRE) is a core task for constructing Multimodal Knowledge images (MKGs). Most current research is based on fine-tuning small-scale single-modal image and text pre-trained models, but we find that image-text datasets from network media suffer from data scarcity, simple text data, and abstract image information, which requires a lot of external knowledge for supplementation and reasoning. We use Multimodal Relation Data augmentation (MRDA) to address the data scarcity problem in MRE, and propose a Flexible Threshold Loss (FTL) to handle the imbalanced entity pair distribution and long-tailed classes. After obtaining prompt information from the small model as a guide model, we employ a Large Language Model (LLM) as a knowledge engine to acquire common sense and reasoning abilities. Notably, both stages of our framework are flexibly replaceable, with the first stage adapting to multimodal related classification tasks for small models, and the second stage replaceable by more powerful LLMs. Through experiments, our EMRE2llm model framework achieves state-of-the-art performance on the challenging MNRE dataset, reaching an 82.95% F1 score on the test set.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/22/12208/pdf

Reference56 articles.

1. Cui, X., Qu, X., Li, D., Yang, Y., Li, Y., and Zhang, X. (2023). MKGCN: Multi-Modal Knowledge Graph Convolutional Network for Music Recommender Systems. Electronics, 12.

2. Zhang, D., Wei, S., Li, S., Wu, H., Zhu, Q., and Zhou, G. (2021, January 2–9). Multi-modal graph fusion for named entity recognition with targeted visual guidance. Proceedings of the AAAI Conference on Artificial Intelligence, Virtually.

3. Sha, Y., Feng, Y., He, M., Liu, S., and Ji, Y. (2023). Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering. Mathematics, 11.

4. Yang, Z. (2020, January 25–30). Biomedical information retrieval incorporating knowledge graph for explainable precision medicine. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual.

5. Knowledge graph embedding for link prediction: A comparative analysis;Rossi;ACM Trans. Knowl. Discov. Data (TKDD),2021