How different is different? Systematically identifying distribution shifts and their impacts in NER datasets-Reference-Cited by-同舟云学术

How different is different? Systematically identifying distribution shifts and their impacts in NER datasets

Published:2024-07-18 Issue: Volume: Page:
ISSN:1574-020X
Container-title:Language Resources and Evaluation
language:en
Short-container-title:Lang Resources & Evaluation

Author:

Li Xue,Groth Paul

Abstract

AbstractWhen processing natural language, we are frequently confronted with the problem of distribution shift. For example, using a model trained on a news corpus to subsequently process legal text exhibits reduced performance. While this problem is well-known, to this point, there has not been a systematic study of detecting shifts and investigating the impact shifts have on model performance for NLP tasks. Therefore, in this paper, we detect and measure two types of distribution shift, across three different representations, for 12 benchmark Named Entity Recognition datasets. We show that both input shift and label shift can lead to dramatic performance degradation. For example, fine-tuning on a wide spectrum dataset (OntoNotes) and testing on an email dataset (CEREC) that shares labels leads to a 63-points drop in F1 performance. Overall, our results indicate that the measurement of distribution shift can provide guidance to the amount of data needed for fine-tuning and whether or not a model can be used “off-the-shelf” without subsequent fine-tuning. Finally, our results show that shift measurement can play an important role in NLP model pipeline definition.

Funder

Nederlandse Organisatie voor Wetenschappelijk Onderzoek

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10579-024-09754-8.pdf

Reference35 articles.

1. Arora, U., Huang, W., He, H. (2021) Types of Out-of-Distribution Texts and How to Detect Them. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (PP 10687–10701). Online and Punta Cana, Dominican RepublicAssociation for Computational Linguistics. [2022-07-18] https://aclanthology.org/2021.emnlp-main.835https://doi.org/10.18653/v1/2021.emnlp-main.835

2. Balasuriya, D., Ringland, N., Nothman, J., Murphy, T., Curran, J. R. (2009) Named entity recognition in Wikipedia. In Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web) (PP 10–18).Suntec, SingaporeAssociation for Computational Linguistics. https://aclanthology.org/W09-3302

3. Cobb, O., Van Looveren, A. (2022). Context-Aware Drift Detection. arXiv arXiv: org/abs/2203.08644https://doi.org/10.48550/arXiv.2203.08644

4. Csurka, G. (2017). Domain adaptation for visual applications: A comprehensive survey.CoRR arXiv: org/abs/1702.05374

5. Dai, X., Karimi, S., Hachey, B., Paris, C. (2019). Using similarity measures to select pretraining data for NER. CoRR arXiv: org/abs/1904.00585