Abstract
AbstractIn recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains. In this paper, we propose an original study of PLMs in the medical domain on French language. We compare, for the first time, the performance of PLMs trained on both public data from the web and private data from healthcare establishments. We also evaluate different learning strategies on a set of biomedical tasks. In particular, we show that we can take advantage of already existing biomedical PLMs in a foreign language by further pre-train it on our targeted data. Finally, we release the first specialized PLMs for the biomedical field in French, called DrBERT, as well as the largest corpus of medical data under free license on which these models are trained.
Publisher
Cold Spring Harbor Laboratory
Reference35 articles.
1. Iz Beltagy , Kyle Lo , and Arman Cohan . 2019. Scibert: Pretrained language model for scientific text. In EMNLP.
2. Emily M. Bender , Timnit Gebru , Angelina McMillan-Major , and Shmargaret Shmitchell . 2021. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Trans-parency, FAccT ‘21, page 610–623, New York, NY, USA. Association for Computing Machinery.
3. The Unified Medical Language System (UMLS): integrating biomedical terminology
4. Casimiro Pio Carrino , Jordi Armengol-Estapé , Asier Gutiérrez-Fandiño , Joan Llop-Palao , Marc Pàmies , Aitor Gonzalez-Agirre , and Marta Villegas . 2021. Biomedical and clinical language models for spanish: On the benefits of domain-specific pretraining in a mid-resource scenario.
5. Ilias Chalkidis , Manos Fergadiotis , Prodromos Malakasiotis , Nikolaos Aletras , and Ion Androutsopoulos . 2020. LEGAL-BERT: the muppets straight out of law school. CoRR, abs/2010.02559.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献