Affiliation:
1. University of Southern California
Funder
JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA
Reference55 articles.
1. Amazon Web Services 2020. https://aws.amazon.com/. Amazon Web Services 2020. https://aws.amazon.com/.
2. On the complexity of scheduling problems for parallel/pipelined machines
3. InferLine
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference With Spatio-Temporal Sharing;IEEE Transactions on Parallel and Distributed Systems;2024-10
2. DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasks;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12
3. Splitwise: Efficient Generative LLM Inference Using Phase Splitting;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29
4. ARISE: High-Capacity AR Offloading Inference Serving via Proactive Scheduling;Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services;2024-06-03
5. Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling;Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing;2024-06-03