1. Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29
2. XiNet: Efficient Neural Networks for tinyML;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01
3. MPI+X:Massive Parallelization and Dynamic Load Balance of a Production-level Unstructured DSMC Solver;2023-06-29
4. Optimizing Depthwise Convolutions on ARMv8 Architecture;Parallel and Distributed Computing, Applications and Technologies;2023
5. Performance Optimization and Analysis of the Unstructured Discontinuous Galerkin Solver on Multi-Core and Many-Core Architectures;2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys);2022-12