Abstract
AbstractAcademic search systems aid users in finding information covering specific topics of scientific interest and have evolved from early catalog-based library systems to modern web-scale systems. However, evaluating the performance of the underlying retrieval approaches remains a challenge. An increasing amount of requirements for producing accurate retrieval results have to be considered, e.g., close integration of the system’s users. Due to these requirements, small to mid-size academic search systems cannot evaluate their retrieval system in-house. Evaluation infrastructures for shared tasks alleviate this situation. They allow researchers to experiment with retrieval approaches in specific search and recommendation scenarios without building their own infrastructure. In this paper, we elaborate on the benefits and shortcomings of four state-of-the-art evaluation infrastructures on search and recommendation tasks concerning the following requirements: support for online and offline evaluations, domain specificity of shared tasks, and reproducibility of experiments and results. In addition, we introduce an evaluation infrastructure concept design aiming at reducing the shortcomings in shared tasks for search and recommender systems.
Funder
GESIS – Leibniz-Institut für Sozialwissenschaften e.V.
Publisher
Springer Science and Business Media LLC
Subject
General Earth and Planetary Sciences,General Environmental Science
Reference27 articles.
1. Balog K, Kelly L, Schuth A (2014) Head first: living labs for ad-hoc search evaluation. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, New York
2. Balog K, Schuth A, Dekker P, Schaer P, Tavakolpoursaleh N, Chuang PY (2016) Overview of the trec 2016 open search track. Proceedings of the 25th Text REtrieval Conference 2016. Gaithersburg, NIST
3. Beel J, Genzmehr M, Langer S, Nürnberger A, Gipp B (2013) A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. Proceedings of the international workshop on reproducibility and replication in recommender systems evaluation. ACM, New York
4. Breuer T, Schaer P, Tavakolpoursaleh N, Schaible J, Wolff B, Müller B (2019) STELLA: towards a framework for the reproducibility of online search experiments. Proceedings of the Open-Source IR Replicability Challenge co-located with SIGIR, OSIRRC@SIGIR.
5. Brodt T, Hopfgartner F (2014) Shedding light on a living lab: The clef newsreel open recommendation platform. IIiX’14: Proceedings of the Information Interaction in Context Conference. ACM, New York
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献