Reducing communication in parallel graph search algorithms with software caches-Reference-Cited by-同舟云学术

Reducing communication in parallel graph search algorithms with software caches

Published:2018-04-15 Issue:2 Volume:33 Page:384-396
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Cicotti Pietro¹,Shantharam Manu¹,Carrington Laura¹

Affiliation:

1. San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA, USA

Abstract

In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a graph search, it is possible to capitalize on the characteristics of small-world networks and cache relevant information of high-degree vertexes. We applied this idea by caching remote vertex ids in a parallel breadth-first search benchmark. Our experiment with different implementations demonstrated significant performance improvements over the reference implementation in several configurations, using 64 to 1024 cores. We proposed a system design in which resources are dedicated exclusively to caching and shared among a set of nodes. Our evaluation demonstrates that this design reduces communication and has the potential to improve performance on large-scale systems in which the communication cost increases significantly with the distance between nodes. We also tested a memcached system as the cache server finding that its generic protocol, which does not match our usage semantics, hinders significantly the potential performance improvements and suggested that a generic system should also support a basic and lightweight communication protocol to meet the needs of high-performance computing applications. Finally, we explored different configurations to find efficient ways to utilize the resources allocated to solve a given problem size; to this extent, we found utilizing half of the compute cores per allocated node improves performance, and even in this case, caching variants always outperform the reference implementation.

Funder

NSF

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342018762510

Reference28 articles.

1. Scalable Graph Exploration on Multicore Processors

2. On the Architectural Requirements for Efficient Execution of Graph Algorithms

3. Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

4. Direction-optimizing Breadth-First Search