Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features

Author:

Marton Yuval1,Habash Nizar2,Rambow Owen2

Affiliation:

1. Nuance Communications

2. Center for Computational Learning Systems, Columbia University

Abstract

We explore the contribution of lexical and inflectional morphology features to dependency parsing of Arabic, a morphologically rich language with complex agreement patterns. Using controlled experiments, we contrast the contribution of different part-of-speech (POS) tag sets and morphological features in two input conditions: machine-predicted condition (in which POS tags and morphological feature values are automatically assigned), and gold condition (in which their true values are known). We find that more informative (fine-grained) tag sets are useful in the gold condition, but may be detrimental in the predicted condition, where they are outperformed by simpler but more accurately predicted tag sets. We identify a set of features (definiteness, person, number, gender, and undiacritized lemma) that improve parsing quality in the predicted condition, whereas other features are more useful in gold. We are the first to show that functional features for gender and number (e.g., “broken plurals”), and optionally the related rationality (“humanness”) feature, are more helpful for parsing than form-based gender and number. We finally show that parsing quality in the predicted condition can dramatically improve by training in a combined gold+predicted condition. We experimented with two transition-based parsers, MaltParser and Easy-First Parser. Our findings are robust across parsers, models, and input conditions. This suggests that the contribution of the linguistic knowledge in the tag sets and features we identified goes beyond particular experimental settings, and may be informative for other parsers and morphologically rich languages.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Cited by 44 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Hybrid embeddings for transition-based dependency parsing of free word order languages;Information Processing & Management;2023-05

2. Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle;Computational Linguistics;2021-11

3. Index;The Cambridge Handbook of Arabic Linguistics;2021-09-30

4. Maltese;The Cambridge Handbook of Arabic Linguistics;2021-09-30

5. Language Planning in the Arab World in an Age of Anxiety;The Cambridge Handbook of Arabic Linguistics;2021-09-30

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3