Abstract
AbstractWe present a new Japanese dataset, Japanese Word Similarity and Association Norm (JWSAN), comprising human rating scores of similarity and association for 2145 word pairs, with a clear distinction between word similarity and word association. Computational models of human semantic memory or mental lexicon, such as distributed semantic models, must predict not only association but also similarity. People can distinguish between word similarity and association. However, although the SimLex-999 dataset is publicly available for English, there is no Japanese similarity dataset with a clear distinction between the two types of word relatedness. JWSAN is the first large Japanese dataset with similarity and association ratings, containing noun, verb, and adjective word pairs. It is also characterized by data collection from a sufficient number of age- and-gender-controlled assessors, with similarity and association ratings obtained via a web-based survey conducted of 6450 native speakers of Japanese. In addition, the effects of the gender and age of the raters were also examined; these factors were only given scant consideration in the past. This dataset can act as a benchmark for improving distributed semantic models in Japanese.
Funder
Japan Society for the Promotion of Science
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Linguistics and Language,Education,Language and Linguistics
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献