Medical Question Summarization with Entity-driven Contrastive Learning

Author:

Lu Wenpeng1ORCID,Wei Sibo1ORCID,Peng Xueping2ORCID,Wang Yi-Fei3ORCID,Naseem Usman4ORCID,Wang Shoujin5ORCID

Affiliation:

1. Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences) and Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China

2. Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia

3. Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China

4. School of Computing, Macquarie University, Sydney, Australia

5. Data Science Institute, University of Technology Sydney, Sydney, Australia

Abstract

By summarizing longer consumer health questions into shorter and essential ones, medical question-answering systems can more accurately understand consumer intentions and retrieve suitable answers. However, medical question summarization is very challenging due to obvious distinctions in health trouble descriptions from patients and doctors. Although deep learning has been applied to successfully address the medical question summarization (MQS) task, two challenges remain: how to correctly capture question focus to model its semantic intention, and how to obtain reliable datasets to fairly evaluate performance. To address these challenges, this article proposes a novel medical question summarization framework based on e ntity-driven c ontrastive l earning (ECL). ECL employs medical entities present in frequently asked questions (FAQs) as focuses and devises an effective mechanism to generate hard negative samples. This approach compels models to focus on essential information and consequently generate more accurate question summaries. Furthermore, we have discovered that some MQS datasets, such as the iCliniq dataset with a 33% duplicate rate, have significant data leakage issues. To ensure an impartial evaluation of the related methods, this article carefully examines leaked samples to reorganize more reasonable datasets. Extensive experiments demonstrate that our ECL method outperforms the existing methods and achieves new state-of-the-art performance, i.e., 52.85, 43.16, 41.31, 43.52 in terms of ROUGE-1 metric on MeQSum, CHQ-Summ, iCliniq, HealthCareMagic dataset, respectively. The code and datasets are available at https://github.com/yrbobo/MQS-ECL

Funder

National Natural Science Foundation of China

Shandong Provincial Natural Science Foundation

Program of New Twenty Policies for Universities of Jinan

Program of Innovation Improvement of Shandong

Publisher

Association for Computing Machinery (ACM)

Reference50 articles.

1. Asma Ben Abacha and Dina Demner-Fushman. 2019. On the summarization of consumer health questions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 2228–2234.

2. Asma Ben Abacha, Yassine M’rabet, Yuhao Zhang, Chaitanya Shivade, Curtis Langlotz, and Dina Demner-Fushman. 2021. Overview of the MEDIQA 2021 shared task on summarization in the medical domain. In Proceedings of the 20th Workshop on Biomedical Language Processing (BioNLP’21). 74–85.

3. A question-entailment approach to question answering;Abacha Asma Ben;BMC Bioinformatics,2019

4. Avi Caciularu, Ido Dagan, Jacob Goldberger, and Arman Cohan. 2022. Long context question answering via supervised contrastive learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’22). 2872–2879.

5. Shuyang Cao and Lu Wang. 2021. CLIFF: Contrastive learning for improving faithfulness and factuality in abstractive summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 6633–6649.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Thinking the Importance of Patient’s Chief Complaint in TCM Syndrome Differentiation;2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD);2024-05-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3