1. Deinsum: Practically I/O Optimal Multi-Linear Algebra;SC22: International Conference for High Performance Computing, Networking, Storage and Analysis;2022-11
2. Symmetric Block-Cyclic Distribution: Fewer Communications Leads to Faster Dense Cholesky Factorization;SC22: International Conference for High Performance Computing, Networking, Storage and Analysis;2022-11
3. On the parallel I/O optimality of linear algebra kernels;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2021-11-13
4. Pebbles, Graphs, and a Pinch of Combinatorics;Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures;2021-07-06
5. Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths;2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2021-05