Affiliation:
1. Language Intelligence Research Section Electronics and Telecommunications Research Institute Daejeon Republic of Korea
Abstract
AbstractWe introduce a high‐performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta‐pseudo‐label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human‐ and pseudo‐labeled data. Second, the influence of noisy pseudo‐labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human‐labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.
Funder
Institute for Information and Communications Technology Promotion
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献