1. Analysis and Optimization of Block LU Decomposition for Execution on Tightly Coupled Processor Arrays;2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP);2024-07-24
2. A new hybrid GPU-CPU sparse LDLT factorization algorithm with GPU and CPU factorizing concurrently;Journal of Computational Science;2024-07
3. Optimizing General Matrix Multiplications on Modern Multi-core DSPs;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27
4. Can GPU performance increase faster than the code error rate?;The Journal of Supercomputing;2024-04-18
5. Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU Parallelism;Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region;2024-01-18