Affiliation:
1. College of Computer and Information Sciences, King Saud University, P.O. Box 2614, Riyadh 13312, Saudi Arabia
Abstract
With the advent of pre-trained language models, many natural language processing tasks in various languages have achieved great success. Although some research has been conducted on fine-tuning BERT-based models for syntactic parsing, and several Arabic pre-trained models have been developed, no attention has been paid to Arabic dependency parsing. In this study, we attempt to fill this gap and compare nine Arabic models, fine-tuning strategies, and encoding methods for dependency parsing. We evaluated three treebanks to highlight the best options and methods for fine-tuning Arabic BERT-based models to capture syntactic dependencies in the data. Our exploratory results show that the AraBERTv2 model provides the best scores for all treebanks and confirm that fine-tuning to the higher layers of pre-trained models is required. However, adding additional neural network layers to those models drops the accuracy. Additionally, we found that the treebanks have differences in the encoding techniques that give the highest scores. The analysis of the errors obtained by the test examples highlights four issues that have an important effect on the results: parse tree post-processing, contextualized embeddings, erroneous tokenization, and erroneous annotation. This study reveals a direction for future research to achieve enhanced Arabic BERT-based syntactic parsing.
Funder
Deanship of Scientific Research at King Saud University through the initiative of DSR Graduate Students Research Support
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference43 articles.
1. Jurafsky, D., and Martin, J. (2019). Speech and Language Processing, Prentice Hall. [3rd ed.].
2. Chowdhary, K. (2020). Fundamentals of Artificial Intelligence, Springer.
3. Dependency Parsing;McDonald;Synth. Lect. Hum. Lang. Technol.,2009
4. Zhang, M., Li, Z., Fu, G., and Zhang, M. (2019, January 2–7). Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA.
5. Sachan, D.S., Zhang, Y., Qi, P., and Hamilton, W.L. (2021, January 19–23). Do Syntax Trees Help Pre-Trained Transformers Extract Information?. Proceedings of the EACL, Online.
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献