1. AdaTune: Adaptive tensor program compilation made efficient;li;Proc Adv Neural Inf Process Syst,2020
2. A Unified Optimization Approach for CNN Model Inference on Integrated GPUs
3. Learning to optimize tensor programs;chen;Proc 32nd Annu Conf Neural Inf Process Syst,2018
4. Accelerating neural architecture search using performance prediction;baker,2017
5. CUDA streams and concurrency, NVIDIA;rennich,2022