Abstract
AbstractAutomatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information. In the medical domain, automatic summarization has the potential to make health information more accessible to people without medical expertise. However, to evaluate the quality of summaries generated by summarization algorithms, researchers first require gold standard, human generated summaries. Unfortunately there is no available data for the purpose of assessing summaries that help consumers of health information answer their questions. To address this issue, we present the MEDIQA-Answer Summarization dataset, the first dataset designed for question-driven, consumer-focused summarization. It contains 156 health questions asked by consumers, answers to these questions, and manually generated summaries of these answers. The dataset’s unique structure allows it to be used for at least eight different types of summarization evaluations. We also benchmark the performance of baseline and state-of-the-art deep learning approaches on the dataset, demonstrating how it can be used to evaluate automatically generated summaries.
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability
Reference24 articles.
1. Mishra, R. et al. Text summarization in the biomedical domain: a systematic review of recent research. Journal of Biomedical Informatics 52, 457–467 (2014).
2. Demner-Fushman, D., Mrabet, Y. & Ben Abacha, A. Consumer health information and question answering: helping consumers find answers to their health-related information needs. Journal of the American Medical Informatics Association 27, 194–201 (2019).
3. National Network of Libraries of Medicine. The consumer health reference interview and ethical issues. National Network of Libraries of Medicine Initiatives, https://nnlm.gov/initiatives/topics/ethics (2020).
4. Higgins, J. P. T. et al. Cochrane Handbook for Systematic Reviews of Interventions, version 6.0. Cochrane www.training.cochrane.org/handbook (2019).
5. Ben Abacha, A. & Demner-Fushman, D. On the Summarization of Consumer Health Questions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2228–2234. Association for Computation Linguistics, Florence, Italy, 2019).
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献