Author:
Albujasim Zainab,Inkpen Diana,Guo Yuhong
Abstract
Word embedding is the foundation of modern language processing (NLP). In the last few decades, word representation has evolved remarkably resulting in an impressive performance in NLP downstream applications. Yet, word embedding's interpretability remains a challenge. In this paper, We propose a simple technique to interpret word embedding. Our method is based on post-processing technique to improve the quality of word embedding and reveal the hidden structure in these embeddings. We deploy Co-clustering method to reveal the hidden structure of word embedding and detect sub-matrices between word meaning and specific dimensions. Empirical evaluation on several benchmarks shows that our method achieves competitive results compared to original word embedding.
Publisher
Academy and Industry Research Collaboration Center (AIRCC)
Reference16 articles.
1. [1] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. "Distributed representations of words,and phrases and their compositionality". In: Advances in neural information processing systems. 2013, pp. 3111-3119.
2. [2] J. Pennington, R. Socher, and C. D. Manning. "Glove: Global vectors for word representation". In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014, pp. 1532-1543.
3. [3] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. "Enriching word vectors with subwordinformation".In: Transactions of the Association for Computational Linguistics 5 (2017), pp. 135-146.
4. [4] A. Hanselowski and I. Gurevych. "Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings". In: arXiv preprint arXiv:1912.10434 (2019).
5. [5] A. Zobnin. "Rotations and interpretability of word embeddings: The case of the Russian language". InInternational Conference on Analysis of Images, Social Networks and Texts. Springer. 2017, pp. 116-128.