Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts-Reference-Cited by-同舟云学术

Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts

Published:2019-12-01 Issue:4 Volume:4 Page:42-55
ISSN:2543-683X
Container-title:Journal of Data and Information Science
language:en
Short-container-title:

Author:

Yu Gaihong¹²,Zhang Zhixiong¹²³,Liu Huan¹²,Ding Liangping¹²

Affiliation:

1. National Science Library, Chinese Academy of Sciences , Beijing 100190 , China

2. University of Chinese Academy of Sciences , Beijing 100049 , China

3. Wuhan Library, Chinese Academy of Sciences , Wuhan 430071 , China

Abstract

Abstract Purpose Move recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units. To improve the performance of move recognition in scientific abstracts, a novel model of move recognition is proposed that outperforms the BERT-based method. Design/methodology/approach Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences. In this paper, inspired by the BERT masked language model (MLM), we propose a novel model called the masked sentence model that integrates the content and contextual information of the sentences in move recognition. Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps. Then, we compare our model with HSLN-RNN, BERT-based and SciBERT using the same dataset. Findings Compared with the BERT-based and SciBERT models, the F1 score of our model outperforms them by 4.96% and 4.34%, respectively, which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-the-art results of HSLN-RNN at present. Research limitations The sequential features of move labels are not considered, which might be one of the reasons why HSLN-RNN has better performance. Our model is restricted to dealing with biomedical English literature because we use a dataset from PubMed, which is a typical biomedical database, to fine-tune our model. Practical implications The proposed model is better and simpler in identifying move structures in scientific abstracts and is worthy of text classification experiments for capturing contextual features of sentences. Originality/value T he study proposes a masked sentence model based on BERT that considers the contextual features of the sentences in abstracts in a new way. The performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.

Publisher

Walter de Gruyter GmbH

Link

https://www.sciendo.com/pdf/10.2478/jdis-2019-0020

Reference23 articles.

1. Amini, I., Martinez, D., & Molla, D. (2012). Overview of the ALTA 2012 shared task. In Proceedings of the Australasian Language Technology Association Workshop 2012: ALTA 2012 (pp. 124–129). Dunedin, New Zealand.

2. Badie, K., Asadi, N., & Tayefeh Mahmoudi, M. (2018). Zone identification based on features with high semantic richness and combining results of separate classifiers. Journal of Information and Telecommunication, 2(4), 411–427.

3. Basili, R. & Pennacchiotti, M. (2010). Distributional lexical semantics: Toward uniform representation paradigms for advanced acquisition and processing tasks. Natural Language Engineering, 1(1), 1–12.

4. Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: Pretrained contextualized embeddings for scientific text. arXiv:1903.10676v3.

5. Dasigi, P., Burns, G.A.P.C., Hovy, E., & Waard, A. (2017). Experiment segmentation in scientific discourse as clause-level structured prediction using recurrent neural networks. arXiv:1702.05398.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RCMR 280k: Refined Corpus for Move Recognition Based on PubMed Abstracts;Data Intelligence;2023

2. A Novel Hierarchical Discourse Model for Scientific Article and It’s Efficient Top-K Resampling-based Text Classification Approach;2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC);2022-10-09

3. Effects of three-layer structure and age on mechanical properties of moso bamboo;BioResources;2021-02-08

4. Prediction of chronic kidney disease risk using multimodal data;2021 The 5th International Conference on Compute and Data Analysis;2021-02-02