Understanding Data Access Patterns for dCache System

Author:

Bellavita Julian,Sim Caitlin,Wu Kesheng,Sim Alex,Yoo Shinjae,Ito Hiro,Garonne Vincent,Lancon Eric

Abstract

The storage management system dCache acts as a disk cache for high-energy physics (HEP) data from the US ATLAS community. Since its disk capacity is considerably smaller than the total volume of ATLAS data, a heuristic is needed to determine what data should be kept on disks. An effective heuristic would be to keep the data files that are expected to be heavily accessed in the near future. Through a careful study of access statistics, we find a few most popular datasets are accessed much more frequently than others, even though these popular datasets change over time. If we could predict the near-term popularity of datasets, we could pin the most popular ones in the disk cache to prevent their accidental removal and guarantee their availability. To predict a dataset popularity, we present several methods for forecasting the number of times a dataset will be accessed in the next day. Test results show that these methods could predict the next-day access counts of popular datasets reliably. This observation is confirmed with dCache logs from two separate time ranges.

Publisher

EDP Sciences

Reference9 articles.

1. Behrmann G., Fuhrmann P., Grønager M., Kleist J., A distributed storage system with dCache, Journal of Physics: Conference Series 119 (2008)

2. Ernst M., Fuhrmann P., Gasthuber M., Mkrtchyan T., Waldman C., dcache, a distributed storage data caching system, Journal of Physics: Conference Series (2001)

3. Wang Y., Wu K., Sim A., Yoo S., Misawa S., Access Patterns to Disk Cache for Large Scientific Archive, in ACM International Workshop on Systems and Network Telemetry and Analytics (2020), pp. 37–40

4. Patra P.K., Sahu M., Mohapatra S., Samantray R.K., File access prediction using neural networks, IEEE Transactions on Neural Networks 21, 869 (2010)

5. Watson R.W., Coyne R.A., The Parallel I/O Architecture of the High-Performance Storage System (HPSS), in Proceedings of the 14th IEEE Symposium on Mass Storage Systems (1995), ISBN 0818670649

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3