Multiword Expression Processing: A Survey

Author:

Constant Mathieu1,Eryiğit Gülşen2,Monti Johanna3,van der Plas Lonneke4,Ramisch Carlos5,Rosner Michael4,Todirascu Amalia6

Affiliation:

1. ATILF, Université de Lorraine & CNRS, 44 avenue de la Libération, 45000, Nancy, France.

2. Istanbul Technical University, ITU Department of Computer Engineering, 34469, Istanbul, Turkey.

3. “L'Orientale” University of Naples, Palazzo Santa Maria Porta Coeli, Via Duomo, 219 80138 Naples, Italy.

4. University of Malta, Tal-Qroqq, Msida MSD2080, Malta.

5. Aix Marseille University, CNRS, LIF, 163 av de Luminy – case 901, 13288, Marseille Cedex 9, France.

6. LiLPa, Strasbourg University, 22, rue René Descartes, BP 80010, 67084, Strasbourg Cedex, France.

Abstract

Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by “MWE processing,” distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Reference211 articles.

1. Abeillé, Anne and Yves Schabes. 1989. Parsing idioms in lexicalized TAGs. In Proceedings of EACL 1989, pages 1–9, Manchester.

2. Adalı, Kübra, Tutkum Dinç, Memduh Gokirmak, and Gülşen Eryiğit. 2016. Comprehensive annotation of multiword expressions for Turkish. In Proceedings of TurCLing 2016 at CICLING 2016, pages 60–66, Konya.

3. Adesam, Yvonne, Gerlof Bouma, and Richard Johansson. 2015. Multiwords, word senses and multiword senses in the Eukalyptus treebank of written Swedish. In Proceedings of TLT 2014, page 3.

4. Robustness beyond shallowness: incremental deep parsing

5. Anastasiou, Dimitra. 2008. Identification of idioms by machine translation: a hybrid research system vs. three commercial systems. In Proceedings of EAMT 2008, pages 12–20, Hamburg.

Cited by 93 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3