Abstract
AbstractOur society critically depends on data, Big Data. The humanity generates and moves data volumes larger than ever before and their increase is continuously accelerating. The goal of this research is to evaluate tools used for the transfer of large volumes of data. Bulk data transfer is a complex endeavour that requires not only sufficient network infrastructure, but also appropriate software, computing power and storage resources. We report on the series of storage benchmarks conducted using recently developed elbencho tool. The tests were conducted with an objective to understand and avoid I/O bottlenecks during data transfer operation. Subsequently Ethernet and InfiniBand networks performance was compared using Ohio State University bandwidth benchmark (OSU BW) and iperf3 tool. For comparison we also tested traditional (very inefficient) Linux scp and rsync commands as well as tools designed specifically to transfer large datasets more efficiently: bbcp and MDTMFTP. Additionally the impact of using simultaneous multi-threading and Ethernet jumbo frames on transfer rate was evaluated.
Publisher
Springer International Publishing
Reference46 articles.
1. Kleinrock, L.: An early history of the internet [History of Communications]. IEEE Commun. Mag. 48(8), 26–36 (2010)
2. Newman, R., Tseng, J.: Memo 134 Cloud Computing and the Square Kilometre Array (2011)
3. Stephens, Z., et al.: Big data: astronomical or genomical? PLOS Biol. 13(7), 1–11 (2015)
4. Fang, C.: Moving massive amounts of data across any distance efficiently, Talk on 2020 Rice Oil & Gas HPC Conference. https://www.youtube.com/watch?v=8PCjMSKMyRw. Accessed 17 Apr 2021
5. Zettar Inc. white paper: Understanding moving data at scale & speed (2020). https://www.zettar.com/white-paper/. Accessed 01 Mar 2021