Evaluation-as-a-Service for the Computational Sciences

Author:

Hopfgartner Frank1ORCID,Hanbury Allan2,Müller Henning3,Eggel Ivan3,Balog Krisztian4,Brodt Torben5,Cormack Gordon V.6,Lin Jimmy6,Kalpathy-Cramer Jayashree7,Kando Noriko8,Kato Makoto P.9,Krithara Anastasia10,Gollub Tim11,Potthast Martin12,Viegas Evelyne13,Mercer Simon14

Affiliation:

1. University of Sheffield, United Kingdom

2. TU Wien, Complexity Science Hub Vienna, Vienna, Austria

3. University of Applied Sciences Western Switzerland (HES-SO), Sierre, Switzerland

4. University of Stavanger, Stavanger, Norway

5. plista GmbH, Berlin Germany

6. University of Waterloo, Waterloo, Canada

7. Athinoula A. Martinos Center for Biomedical Imaging at Massachusetts General Hospital and Harvard Medical School, Charlestown, MA USA

8. National Institute of Informatics, Tokyo, Japan

9. Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, Japan

10. National Center for Scientific Research “Demokritos”, Paraskevi, Athens, Greece

11. Bauhaus-Universität Weimar, Weimar, Germany

12. Leipzig University, Leipzig, Germany

13. Microsoft Research, Redmond, WA, USA

14. Independent Consultant

Abstract

Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely large data sets, confidential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the field of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results.

Funder

European Science Foundation via its Research Network Program “Evaluating Information Access Systems”

European Commission via the FP7 project VISCERAL

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems and Management,Information Systems

Reference75 articles.

1. Competing engines of growth: Innovation and standardization

2. Improvements that don't add up

3. Michael Arrington. 2006. AOL Proudly Releases Massive Amounts of Private Data. Retrieved from https://techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/. Michael Arrington. 2006. AOL Proudly Releases Massive Amounts of Private Data. Retrieved from https://techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/.

4. Head First

5. Shedding light on a living lab

Cited by 16 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Browsing and Searching Metadata of TREC;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10

2. The Information Retrieval Experiment Platform;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18

3. Continuous Integration for Reproducible Shared Tasks with TIRA.io;Lecture Notes in Computer Science;2023

4. ir_metadata: An Extensible Metadata Schema for IR Experiments;Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval;2022-07-06

5. Analytics Methods to Understand Information Retrieval Effectiveness—A Survey;Mathematics;2022-06-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3