Author:
MITRA SUNNY,MITRA RITWIK,MAITY SUMAN KALYAN,RIEDL MARTIN,BIEMANN CHRIS,GOYAL PAWAN,MUKHERJEE ANIMESH
Abstract
AbstractIn this paper, we propose an unsupervised and automated method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books and millions of tweets posted per day. We construct distributional-thesauri-based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we propose a split/join based approach to compare the sense clusters at two different time points to find if there is ‘birth’ of a new sense. The approach also helps us to find if an older sense was ‘split’ into more than one sense or a newer sense has been formed from the ‘join’ of older senses or a particular sense has undergone ‘death’. We use this completely unsupervised approach (a) within the Google books data to identify word sense differences within a media, and (b) across Google books and Twitter data to identify differences in word sense distribution across different media. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet.
Publisher
Cambridge University Press (CUP)
Subject
Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software
Reference40 articles.
1. Aging in Language Dynamics
2. Fellbaum C. (ed.) 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
3. Tahmasebi N. , Risse T. , and Dietze S. 2011. Towards automatic language evolution tracking: a study on word sense tracking. In Proceedings of EvoDyn, vol. 784, Bonn, Germany.
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献