1. [1] TCMalloc (online), available from <http://goog-perftools.sourceforge.net/doc/tcmalloc.html>.
2. [2] The 2nd Parallel Programming Contest on Cluster Systems, available from <https://www2.cc.u-tokyo.ac.jp/procon2009-2/>.
3. [3] Antoniu, G., Bouge, L. and Namyst, R.: An Efficient and Transparent Thread Migration Scheme in the PM2 Runtime System, Proc. IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, pp.496-510 (1999).
4. [4] Antoniu, G. and Perez, C.: Using Preemptive Thread Migration to Load-Balance Data-Parallel Applications, Proc. 5th International Euro-Par Conference on Parallel Processing, pp.117-124 (1999).
5. [5] Brightwell, R. and Pedretti, K.: Optimizing Multi-core MPI Collectives with SMARTMAP, 2009 International Conference on Parallel Processing Workshops, pp.370-377 (Sep. 2009).