A dataset for plain language adaptation of biomedical abstracts-Reference-Cited by-同舟云学术

A dataset for plain language adaptation of biomedical abstracts

Published:2023-01-04 Issue:1 Volume:10 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Attal Kush^ORCID,Ondov Brian^ORCID,Demner-Fushman Dina^ORCID

Abstract

AbstractThough exponentially growing health-related literature has been made available to a broad audience online, the language of scientific articles can be difficult for the general public to understand. Therefore, adapting this expert-level language into plain language versions is necessary for the public to reliably comprehend the vast health-related literature. Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed for proper evaluation. Proposed datasets thus far consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This leads to a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. The dataset contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark automatic adaptation on the dataset with state-of-the-art Deep Learning approaches, setting baselines for future research.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-022-01920-3.pdf

Reference49 articles.

1. MedlinePlus - Health Information from the National Library of Medicine.

2. Rosenberg, S. A. et al. Online patient information from radiation oncology departments is too complex for the general population. Practical Radiation Oncology 7, 57–62, https://doi.org/10.1016/j.prro.2016.07.008 (2017).

3. Stableford, S. & Mettger, W. Plain language: a strategic response to the health literacy challenge. Journal of public health policy 28, 71–93 (2007).

4. Xu, W., Napoles, C., Pavlick, E., Chen, Q. & Callison-Burch, C. Optimizing Statistical Machine Translation for Text Simplification. Transactions of the Association for Computational Linguistics 4, 401–415, https://doi.org/10.1162/tacl_a_00107 (2016).

5. Carlo, M. S. et al. Closing the gap: Addressing the vocabulary needs of english-language learners in bilingual and mainstream classrooms. Reading research quarterly 39, 188–215 (2004).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Retrieval augmentation of large language models for lay language generation;Journal of Biomedical Informatics;2024-01