LHF: A New Archive based Approach to Accelerate Massive Small Files Access Performance in HDFS-Reference-Cited by-同舟云学术

LHF: A New Archive based Approach to Accelerate Massive Small Files Access Performance in HDFS

Published:2019-02-07 Issue: Volume: Page:
ISSN:2516-2314
Container-title:EasyChair Preprints
language:
Short-container-title:

Author:

Tao Wenjun,Zhai Yanlong,Tchaye-Kondi Jude

Abstract

As one of the most popular open source projects, Hadoop is considered nowadays as the de-facto framework for managing and analyzing huge amounts of data. HDFS (Hadoop Distributed File System) is one of the core components in Hadoop framework to store big data, especially semi-structured and unstructured data. HDFS provides high scalability and reliability when handling large files across thousands of machines. But the performance will be severely degraded while dealing with massive small files. Although some effort was spent to investigate this well-known issue, existing approaches, such as HAR, SequenceFile, and MapFile, are limited in their ability to reduce the memory consumption of the NameNode and optimize the access performance in the meantime. In this paper, we presented LHF, a solution to handle massive small files in HDFS by merging small files into big files and building a linear hashing based extendable index to speed up the process of locating a small file. The advantages of our approach are (1) it significantly reduces the size of the metadata, (2) it does not require sorting the files at the client side, (3) it supports appending more small files to the merged file afterwards and (4) it achieves good access performance.  A series of experiments were performed to demonstrate the effectiveness and efficiency of LHF as well, which takes less time while accessing files compared with other methods.

Publisher

EasyChair

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Small files access efficiency in hadoop distributed file system a case study performed on British library text files;Cluster Computing;2023-04-07

2. Localisation-Safe Reinforcement Learning for Mapless Navigation;2022 IEEE International Conference on Robotics and Biomimetics (ROBIO);2022-12-05

3. Disentangled Speaker Representation Learning via Mutual Information Minimization;2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC);2022-11-07

4. Curriculum Adversarial Training for Robust Reinforcement Learning;2022 International Joint Conference on Neural Networks (IJCNN);2022-07-18

5. Viscosity Limits for Zeroth‐Order Pseudodifferential Operators;Communications on Pure and Applied Mathematics;2022-06-15