Affiliation:
1. Academia Sinica
2. National Taiwan University
Abstract
Transactional Synchronization Extensions (TSX) have been introduced for hardware transactional memory since the 4th generation Intel Core processors. TSX provides two software programming interfaces: Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM). HLE is easy to use and maintains backward compatibility with processors without TSX support, while RTM is more flexible and scalable. Previous researches have shown that critical sections protected by RTM with a well-designed retry mechanism as its fallback code path can often achieve better performance than HLE. More parallel programs may be programmed in HLE, however, using RTM may obtain greater performance. To embrace both productivity and high performance of parallel programs with TSX, we present a framework built on QEMU that can dynamically transform HLE instructions in an application binary to fragments of RTM codes with adaptive tuning on the fly. Compared to HLE execution, our prototype achieves 1.56 x speedup with 8 threads on average. Due to the scalability of RTM, the speedup will be more significant as the number of threads increases.
Publisher
Association for Computing Machinery (ACM)