Testing the stability of “wisdom of crowds” judgments of search results over time and their similarity with the search engine rankings
Author:
Zhitomirsky-Geffet Maayan,Bar-Ilan Judit,Levene Mark
Abstract
Purpose
– One of the under-explored aspects in the process of user information seeking behaviour is influence of time on relevance evaluation. It has been shown in previous studies that individual users might change their assessment of search results over time. It is also known that aggregated judgements of multiple individual users can lead to correct and reliable decisions; this phenomenon is known as the “wisdom of crowds”. The purpose of this paper is to examine whether aggregated judgements will be more stable and thus more reliable over time than individual user judgements.
Design/methodology/approach
– In this study two simple measures are proposed to calculate the aggregated judgements of search results and compare their reliability and stability to individual user judgements. In addition, the aggregated “wisdom of crowds” judgements were used as a means to compare the differences between human assessments of search results and search engine’s rankings. A large-scale user study was conducted with 87 participants who evaluated two different queries and four diverse result sets twice, with an interval of two months. Two types of judgements were considered in this study: relevance on a four-point scale, and ranking on a ten-point scale without ties.
Findings
– It was found that aggregated judgements are much more stable than individual user judgements, yet they are quite different from search engine rankings.
Practical implications
– The proposed “wisdom of crowds”-based approach provides a reliable reference point for the evaluation of search engines. This is also important for exploring the need of personalisation and adapting search engine’s ranking over time to changes in users preferences.
Originality/value
– This is a first study that applies the notion of “wisdom of crowds” to examine an under-explored in the literature phenomenon of “change in time” in user evaluation of relevance.
Subject
Library and Information Sciences,Information Systems
Reference64 articles.
1. Agichtein, E.
,
Brill, E.
and
Dumais, S.
(2006), “Improving web search ranking by incorporating user behavior information”, Proceedings of SIGIR’06, ACM, New York, NY, pp. 19-26. 2. Bao, S.
,
Xue, G.
,
Wu, X.
,
Yu, Y.
,
Fei, B.
and
Su, Z.
(2007), “Optimizing web search using social annotations”, in
Patel-Schnider, P.
,
Shenoy, P.
,
Williamson, C.
and
Zurko, M.
(Eds), WWW ‘07: Proceedings of the 16th International Conference on World Wide Web, ACM, New York, NY, pp. 501-510. 3. Bates, M.
(1989), “The design of browsing and berrypicking techniques for the online search interface”,
Online Review
, Vol. 13 No. 5, pp. 407-424. 4. Bar-Ilan, J.
and
Levene, M.
(2011), “A method to assess search engine results”,
Online Information Review
, Vol. 35 No. 6, pp. 854-868. 5. Bar-Ilan, J.
,
Keenoy, K.
,
Yaari, E.
and
Levene, M.
(2007), “User rankings of search engine results”,
Journal of the Association for Information Science and Technology
, Vol. 58 No. 9, pp. 1254-1266.
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|