Affiliation:
1. Departamento de Informática, Universidad de Valencia, Spain
2. Departamento Ingeniería y Tecnología de Computadores, Universidad de Murcia, Spain
Abstract
The computing capabilities of current multi-core and many-core architectures have been used in crowd simulations for both enhancing crowd rendering and simulating continuum crowds. However, improving the scalability of crowd simulation systems by exploiting the inherent parallelism of these architectures is still an open issue. In this paper, we propose different parallelization strategies for the collision check procedure that takes place in agent-based simulations. These strategies are designed for exploiting the parallelism in both multi-core and many-core architectures like graphic processing units (GPUs). As for the many-core implementations, we analyse the bottlenecks of a previous GPU version of the collision check algorithm, proposing a new GPU version that removes the bottlenecks detected. In order to fairly compare the GPU with the multi-core implementations, we propose a parallel CPU version that uses read--copy update (RCU), a new synchronization method which significantly improves performance. We perform a comparison study of these different implementations. On the one hand, the comparison study shows the first performance evaluation of RCU in a real user-space application with complex data structures. On the other hand, the comparison shows that the GPU greatly accelerates the collision test with respect to any other implementation optimized for multi-core CPUs. In addition, we analyse the efficiency of the different implementations taking into account the theoretical performance and power consumption of each platform. The evaluation results show that the GPU-based implementation consumes less energy and provides a minimum speedup of 45× with respect to any of the CPU-based implementations. Since interactivity is a hard constraint in crowd simulations, this acceleration of the collision check process represents a significant improvement in the overall system throughput and response time. Therefore, the simulations are significantly accelerated, and the system throughput and scalability are improved.
Subject
Hardware and Architecture,Theoretical Computer Science,Software
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献