Affiliation:
1. Chulalongkorn University, Pathumwan, Thailand
2. NECTEC, Pathumthani, Thailand
Abstract
Sequential tagging tasks, such as Part-Of-Speech (POS) tagging and Named-Entity Recognition, are the building blocks of many natural language processing applications. Although prior works have reported promising results in standard settings, they often underperform on non-standard text, such as microblogs and social media. In this article, we introduce an adversarial evaluation scheme for the Thai language by creating adversarial examples based on known spelling errors. Furthermore, we propose novel methods including UNK masking, condition initialization with affixation embeddings, and untied-directional self-attention mechanism to enhance robustness and interpretability of the neural networks. We conducted experiments on two Thai corpora: BEST2010 and ORCHID. Our adversarial evaluation schemes reveal that bidirectional LSTM (BiLSTM) do not perform well on adversarial examples. Our best methods match the performance of the BiLSTM baseline model and outperform it on adversarial examples.
Funder
Thailand Graduate Institute of Science and Technology
Publisher
Association for Computing Machinery (ACM)