A study of user profile representation for personalized cross-language information retrieval
Author:
Zhou Dong,Lawless Séamus,Wu Xuan,Zhao Wenyu,Liu Jianxun
Abstract
Purpose
– With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.
Design/methodology/approach
– The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.
Findings
– Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.
Originality/value
– Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.
Subject
Library and Information Sciences,Information Systems
Reference52 articles.
1. Agichtein, E.
,
Brill, E.
and
Dumais, S.
(2006), “Improving web search ranking by incorporating user behavior information”, in
Efthimiadis, E.N.
,
Dumais, S.T.
,
Hawking, D.
and
Järvelin, K.
(Eds), Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, ACM, New York, NY, pp. 19-26. 2. Amati, G.
and
Rijsbergen, C.J.V.
(2002), “Probabilistic models of information retrieval based on measuring the divergence from randomness”,
ACM Transactions on Information Systems
, Vol. 20 No. 4, pp. 357-389. 3. Ambati, V.
and
Rohini, U.
(2006), “Using monolingual clickthrough data to build cross-lingual search systems”, in
Gey, F.C.
,
Kando, N.
,
Lin, C.-Y.
and
Peters, C.
(Eds),
New Directions in Multilingual Information Access Workshop of SIGIR 2006, Seattle, WA
, ACM, New York, NY, pp. 28-35. 4. Azzopardi, L.
,
de Rijke, M.
and
Balog, K.
(2007), “Building simulated queries for known-item topics: an analysis using six European languages”, in
Kraaij, W.
,
de Vries, A.P.
,
Clarke, C.L.A.
,
Fuhr, N.
and
Kando, N.
(Eds), Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, ACM, New York, NY, pp. 455-462. 5. Baeza-Yates, R.
and
Ribeiro-Neto, B.
(2011),
Modern Information Retrieval: The Concepts and Technology Behind Search
, 2nd ed., Addison-Wesley Professional, NewYork, NY.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|