MMTN: Multi-Modal Memory Transformer Network for Image-Report Consistent Medical Report Generation-Reference-Cited by-同舟云学术

MMTN: Multi-Modal Memory Transformer Network for Image-Report Consistent Medical Report Generation

Published:2023-06-26 Issue:1 Volume:37 Page:277-285
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Cao Yiming,Cui Lizhen,Zhang Lei,Yu Fuqiang,Li Zhen,Xu Yonghui

Abstract

Automatic medical report generation is an essential task in applying artificial intelligence to the medical domain, which can lighten the workloads of doctors and promote clinical automation. The state-of-the-art approaches employ Transformer-based encoder-decoder architectures to generate reports for medical images. However, they do not fully explore the relationships between multi-modal medical data, and generate inaccurate and inconsistent reports. To address these issues, this paper proposes a Multi-modal Memory Transformer Network (MMTN) to cope with multi-modal medical data for generating image-report consistent medical reports. On the one hand, MMTN reduces the occurrence of image-report inconsistencies by designing a unique encoder to associate and memorize the relationship between medical images and medical terminologies. On the other hand, MMTN utilizes the cross-modal complementarity of the medical vision and language for the word prediction, which further enhances the accuracy of generating medical reports. Extensive experiments on three real datasets show that MMTN achieves significant effectiveness over state-of-the-art approaches on both automatic metrics and human evaluation.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey on advancements in image–text multimodal models: From general techniques to biomedical implementations;Computers in Biology and Medicine;2024-08

2. Reinforced Visual Interaction Fusion Radiology Report Generation;2024-07-31

3. Advancing medical imaging with language models: featuring a spotlight on ChatGPT;Physics in Medicine & Biology;2024-05-03

4. Unsupervised disease tags for automatic radiology report generation;Biomedical Signal Processing and Control;2024-03

5. Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays;Lecture Notes in Computer Science;2024