Affiliation:
1. Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, PUE, Mexico
2. Department of Systems, Universidad Autónoma Metropolitana Unidad Azcapotzalco, CDMX, Mexico
Abstract
Automatic validation of compositionality vs non-compositionality is a very challenging problem in NLP. A very small number of papers in literature report results in this particular problem. Recently, some new approaches have arised with respect to this particular linguistic task. One of these approaches that have called our attention is based on what authors call “lexical domain”. In this paper, we analyze the use of Pointwise Mutual Information for constructing thesauri on the fly, which can be further employed instead of dictionaries for determining whether or not a given phraseological unit is compositional or not. The experimental results carried out in this paper show that this dissimilarity measure (PMI), can effectively be used when determining compositionality of a given verbal phraseological unit. Moreover, we show that the use of thesauri improves the results obtained in comparison with those experiments employing dictionaries, highlighting the use of self-constructed lexical resources which are, in fact, taking advantage of the same vocabulary of the target dataset.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference9 articles.
1. Word association norms, mutual information, and lexicography;Church;Computational Linguistics,1990
2. Manning C.D. and Schütze H. , Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA. (1999).
3. Priego Sánchez B. and Pinto D. , Identification of verbal phraseological units in mexican news stories, Computación y Sistemas 19(4) (2015).
4. An unsupervised method for automatic validation of verbal phraseological units;Priego Sánchez;Journal of Intelligent and Fuzzy Systems,2019