Affiliation:
1. Peking University
2. Beijing Language and Culture University
3. NLRMR for Print Media
Abstract
AbstractIn this study, we propose a new evaluation scheme to assess the strengths and limitations of collocation extraction measures and explore type-sensitive methods for extracting collocations. We introduced the pooling strategy widely used in Information Retrieval and automated the evaluation process using online dictionaries. Sixteen well-known metrics are evaluated based on their effectiveness and then distributional and linguistic compared. The results show that Group A methods (e.g. z-score, Dice, PMI) are more effective in extracting low-frequency collocations with relatively small extraction scales. In contrast, Group B methods (e.g. t-test, LMI, LLR) perform better at finding high-frequency collocations, most of which outperform Group A methods as the extraction scale increases. Moreover, Group A prefers NN collocations, while Group B identifies collocations with a wide range of syntactic structures. This study provides suggestions for studies to identify hybrid extraction methods as well as for language educators and dictionary compilers.
Publisher
John Benjamins Publishing Company
Subject
Linguistics and Language,Language and Linguistics
Reference56 articles.
1. Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach
2. Hybrid method for automatic extraction of multiword expressions
3. Towards a Firthian notion of collocation;Bartsch;Vernetzungsstrategien Zugriffsstrukturen und automatisch ermittelte Angaben in Internetwörterbüchern,2014
4. The computation of collocations and their relevance in lexical studies;Berry-Rogghe,1973
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献