Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure-Reference-Cited by-同舟云学术

Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure

Published:2016 Issue: Volume:2016 Page:1-11
ISSN:1687-5265
Container-title:Computational Intelligence and Neuroscience
language:en
Short-container-title:Computational Intelligence and Neuroscience

Author:

Zhang Wen¹^ORCID,Xiao Fan¹,Li Bin¹,Zhang Siguang²

Affiliation:

1. Center on Big Data Sciences, Beijing University of Chemical Technology, Beijing 100039, China

2. Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, China

Abstract

Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

Link

http://downloads.hindawi.com/journals/cin/2016/1096271.pdf

Reference22 articles.

1. Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework

2. Using Linear Algebra for Intelligent Information Retrieval

3. CDIM: Document Clustering by Discrimination Information Maximization

4. Hard and fuzzy diagonal co-clustering for document-term partitioning

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RP-LGMC: Rating prediction based on local and global information with matrix clustering;Computers & Operations Research;2021-05

2. A Two-Stage Rating Prediction Approach Based on Matrix Clustering on Implicit Information;IEEE Transactions on Computational Social Systems;2020-04