1. Ten lessons from three generations shaped Google’s TPUv4i;jouppi;Proc Int Symp Comput Arch,0
2. Language models are few-shot learners;brown;Proc Conf Neural Inf Process Syst,0
3. GLaM: Efficient scaling of language models with mixture-of-experts;du,2021