1. Accelerating Neural Network Training using Arbitrary Precision Approximating Matrix Multiplication Algorithms
2. Tal Ben-Nun and Torsten Hoefler . 2019 . Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput. Surv., 52, 4 , Article 65 , (Aug. 2019). Tal Ben-Nun and Torsten Hoefler. 2019. Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput. Surv., 52, 4, Article 65, (Aug. 2019).
3. Yuze Chi Licheng Guo and Jason Cong. 2022. Accelerating SSSP for power-law graphs. In FPGA. Yuze Chi Licheng Guo and Jason Cong. 2022. Accelerating SSSP for power-law graphs. In FPGA.
4. Young-kyu Choi Yuze Chi Weikang Qiao Nikola Samardzic etal 2021. HBM connect: high-performance HLS interconnect for FPGA HBM. In FPGA 116--126. Young-kyu Choi Yuze Chi Weikang Qiao Nikola Samardzic et al. 2021. HBM connect: high-performance HLS interconnect for FPGA HBM. In FPGA 116--126.
5. Young-kyu Choi Yuze Chi Jie Wang Licheng Guo etal 2020. When HLS meets FPGA HBM: benchmarking and bandwidth optimization. (2020). Young-kyu Choi Yuze Chi Jie Wang Licheng Guo et al. 2020. When HLS meets FPGA HBM: benchmarking and bandwidth optimization. (2020).