Affiliation:
1. Saama AI Research Lab , Pune 411057, India
Abstract
Abstract
Motivation
The medical data are complex in nature as terms that appear in records usually appear in different contexts. Through this article, we investigate various bio model’s embeddings (BioBERT, BioELECTRA and PubMedBERT) on their understanding of ‘negation and speculation context’ wherein we found that these models were unable to differentiate ‘negated context’ versus ‘non-negated context’. To measure the understanding of models, we used cosine similarity scores of negated sentence embeddings versus non-negated sentence embeddings pairs. For improving these models, we introduce a generic super tuning approach to enhance the embeddings on ‘negation and speculation context’ by utilizing a synthesized dataset.
Results
After super-tuning the models, we can see that the model’s embeddings are now understanding negative and speculative contexts much better. Furthermore, we fine-tuned the super-tuned models on various tasks and we found that the model has outperformed the previous models and achieved state-of-the-art on negation, speculation cue and scope detection tasks on BioScope abstracts and Sherlock dataset. We also confirmed that our approach had a very minimal trade-off in the performance of the model in other tasks like natural language inference after super-tuning.
Availability and implementation
The source code, data and the models are available at: https://github.com/comprehend/engg-ai-research/tree/uncertainty-super-tuning.
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献