1. A. Vaswani , “ Attention Is All You Need ,” arXiv.org, Jun. 12, 2017 . https://arxiv.org/abs/1706.03762 A. Vaswani , “Attention Is All You Need,” arXiv.org, Jun. 12, 2017. https://arxiv.org/abs/1706.03762
2. S. Black , “ GPT-NeoX-20B: An Open-Source Autoregressive Language Model ,” arXiv.org, Apr. 14, 2022 . https://arxiv.org/abs/2204.06745 S. Black , “GPT-NeoX-20B: An Open-Source Autoregressive Language Model,” arXiv.org, Apr. 14, 2022. https://arxiv.org/abs/2204.06745
3. S. Rajbhandari , “ DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale ,” arXiv.org, Jan. 14, 2022 . https://arxiv.org/abs/2201.05596 S. Rajbhandari , “DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale,” arXiv.org, Jan. 14, 2022. https://arxiv.org/abs/2201.05596
4. S. Rajbhandari , J. Rasley , O. Ruwase , and Y. He , “ ZeRO: Memory Optimizations Toward Training Trillion Parameter Models ,” arXiv.org, Oct. 04, 2019 . S. Rajbhandari, J. Rasley, O. Ruwase, and Y. He, “ZeRO: Memory Optimizations Toward Training Trillion Parameter Models,” arXiv.org, Oct. 04, 2019.
5. B. Y. Lin , “ CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning ,” arXiv.org, Nov. 09, 2019 . https://arxiv.org/abs/1911.03705 B. Y. Lin , “CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning,” arXiv.org, Nov. 09, 2019. https://arxiv.org/abs/1911.03705