Advancing throughput of HEP analysis work-flows using caching concepts

Author:

Caspart Rene,Fischer Max,Giffels Manuel,Heidecker Christoph,Kühn Eileen,Quast Günter,Sauter Martin,Schnepf Matthias J.,von Cube R. Florian

Abstract

High throughput and short turnaround cycles are core requirements for efficient processing of data-intense end-user analyses in High Energy Physics (HEP). Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to computing resources for end-user analyses. Bringing data close to the computing resource is a very promising approach to solve throughput limitations and improve the overall performance. However, achieving data locality by placing multiple conventional caches inside a distributed computing infrastructure leads to redundant data placement and inefficient usage of the limited cache volume. The solution is a coordinated placement of critical data on computing resources, which enables matching each process of an analysis work-flow to its most suitable worker node in terms of data locality and, thus, reduces the overall processing time. This coordinated distributed caching concept was realized at KIT by developing the coordination service NaviX that connects an XRootD cache proxy infrastructure with an HTCondor batch system. We give an overview about the coordinated distributed caching concept and experiences collected on prototype system based on NaviX.

Publisher

EDP Sciences

Reference11 articles.

1. Schnepf M J, von Cube R F, Fischer M, Giffels M, Heidecker C, Heiss A, Kuehn E, Petzold A, Quast G, Sauter M Dynamic Integration and Management of Opportunistic Resources for HEP, EPJ Web of Conferences Proceedings of CHEP 2018 (to be published) (2018)

2. Eck C et al., LHC computing Grid : Technical Design Report, Technical Design Report LCG URL https://cds.cern.ch/record/840543 [Accessed 2018-10-07] (2005)

3. The Hadoop project homepage, URL https://hadoop.apache.org [Accessed 2018] (2018)

4. Jones B, Kirianov A, Lamanna M, Mascetti L, McCance G, Rousseau H, Schulz M and Smith D, Sharing server nodes for storage and compute, Journal of Physics: Conference Series URL https://indico.cern.ch/event/587955/contributions/2937728 [Accessed 2018-10-07] (to be published)

5. Fischer M, Metzlaff C, Kühn E, Giffels M, Quast G, Jung C and Hauth T, High Performance Data Analysis via Coordinated Caches, Journal of Physics: Conference Series 664 9 092008 (2015)

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3