1. In-Datacenter Performance Analysis of a Tensor Processing Unit
2. A detailed and flexible cycle-accurate Network-on-Chip simulator
3. CSR5
4. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization;kung;Proceedings of the fourth international conference on Architectural support for programming languages and operating systems - AS,2019
5. BiELL: A bisection ELLPACK-based storage format for optimizing SpMV on GPUs