Author:
Si Shijing,Zheng Weiguo,Zhou Liuyang,Zhang Mei
Abstract
Abstract
Computing semantic similarity between sentences or texts is vital in many natural language processing (NLP) tasks such as search, query suggestion, and question answering (QA). Many methods have been developed, based on lexical matching, distributional semantics, etc. However, lexical features, like string matching, fail to capture semantic similarity. In this research, our focus lies on the implementation of distributional representations and how to tune parameters when obtaining representations of words with commonly used word embedding techniques, e.g., Word2Vec and GloVe. We conduct experiments in the setting of Chinese semantic sentence matching tasks on the finance-domain. We examine the goodness of word embedding by both the cosine similarity of semantically similar sentence pairs and semantically dissimilar pairs. Based on our experiments, Word2Vec performs better than GloVe in the sense that Chinese character embedding from Word2Vec yield larger disparity of cosine distances between similar sentence pairs and dissimilar pairs. Also we report the optimal parameters for Word2Vec continuous bag-of-word (CBOW) through our trials, with window size being 6 and embedding dimension being 400, which can be good initial values for other projects.
Subject
General Physics and Astronomy
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献