Pre-trained Language Models in Biomedical Domain: A Systematic Survey

Author:

Wang Benyou1ORCID,Xie Qianqian2ORCID,Pei Jiahuan3ORCID,Chen Zhihong4ORCID,Tiwari Prayag5ORCID,Li Zhao6ORCID,Fu Jie7ORCID

Affiliation:

1. SRIBD & SDS, The Chinese University of Hong Kong, Shenzhen, China

2. Department of Computer Science, University of Manchester, United Kingdom

3. University of Amsterdam, Netherlands

4. SRIBD & SSE, The Chinese University of Hong Kong, Shenzhen, China

5. School of Information Technology, Halmstad University, Sweden

6. The University of Texas Health Science Center at Houston, USA

7. Mila, University of Montreal, Canada

Abstract

Pre-trained language models (PLMs) have been the de facto paradigm for most natural language processing tasks. This also benefits the biomedical domain: researchers from informatics, medicine, and computer science communities propose various PLMs trained on biomedical datasets, e.g., biomedical text, electronic health records, protein, and DNA sequences for various biomedical tasks. However, the cross-discipline characteristics of biomedical PLMs hinder their spreading among communities; some existing works are isolated from each other without comprehensive comparison and discussions. It is nontrivial to make a survey that not only systematically reviews recent advances in biomedical PLMs and their applications but also standardizes terminology and benchmarks. This article summarizes the recent progress of pre-trained language models in the biomedical domain and their applications in downstream biomedical tasks. Particularly, we discuss the motivations of PLMs in the biomedical domain and introduce the key concepts of pre-trained language models. We then propose a taxonomy of existing biomedical PLMs that categorizes them from various perspectives systematically. Plus, their applications in biomedical downstream tasks are exhaustively discussed, respectively. Last, we illustrate various limitations and future trends, which aims to provide inspiration for the future research.

Funder

Chinese Key-Area Research and Development Program of Guangdong Province

Shenzhen Science and Technology Program

Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen, Shenzhen Key Research Project

Shenzhen Doctoral Startup Funding

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Reference346 articles.

1. Asma Ben Abacha and Dina Demner-Fushman. 2016. Recognizing question entailment for medical question answering. In AMIA Annual Symposium Proceedings, Vol. 2016. American Medical Informatics Association, 310.

2. Asma Ben Abacha, Chaitanya Shivade, and Dina Demner-Fushman. 2019. Overview of the mediqa 2019 shared task on textual inference, question entailment and question answering. In BioNLP Workshop and Shared Task. 370–379.

3. Arda Akdemir and Tetsuo Shibuya. 2020. Transfer learning for biomedical question answering. In CLEF (Working Notes).

4. Liliya Akhtyamova. 2020. Named entity recognition in spanish biomedical literature: Short review and bert model. In FRUCT. IEEE, 1–7.

5. Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3