Accelerating recommendation system training by leveraging popular choices-Reference-Cited by-同舟云学术

Accelerating recommendation system training by leveraging popular choices

Published:2021-09 Issue:1 Volume:15 Page:127-140
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Adnan Muhammad¹,Maboud Yassaman Ebrahimzadeh¹,Mahajan Divya²,Nair Prashant J.¹

Affiliation:

1. University of British Columbia

2. Microsoft

Abstract

Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this paper deep dives into the semantics of training data and obtains insights about the feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed with a few embedding entries being accessed up to 10000X more. This paper leverages this asymmetrical access pattern to offer a framework, called FAE, and proposes a hot-embedding aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reduces the data transfers from CPU to GPU. At the same time, FAE engages the GPU to accelerate the executions of these hot embedding entries. Experiments on production-scale recommendation models with real datasets show that FAE reduces the overall training time by 2.3X and 1.52X in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3485450.3485462

Reference73 articles.

1. The Netflix Recommender System

2. Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. Deep learning recommendation model for personalization and recommendation systems. CoRR abs/1906.00091 2019. URL https://arxiv.org/abs/1906.00091. Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. Deep learning recommendation model for personalization and recommendation systems. CoRR abs/1906.00091 2019. URL https://arxiv.org/abs/1906.00091.

3. Two Decades of Recommender Systems at Amazon.com

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

2. Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

3. NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3;2024-04-27

4. RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2024-04-27

5. The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format;Proceedings of the ACM on Management of Data;2024-03-12