1. Martín Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , 2016. Tensorflow: a system for large-scale machine learning .. In Osdi, Vol. 16. Savannah, GA, USA , 265–283. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: a system for large-scale machine learning.. In Osdi, Vol. 16. Savannah, GA, USA, 265–283.
2. Jonghyun Bae Jongsung Lee Yunho Jin Sam Son Shine Kim Hakbeom Jang Tae Jun Ham and Jae W Lee. 2021. FlashNeuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks.. In FAST. 387–401. Jonghyun Bae Jongsung Lee Yunho Jin Sam Son Shine Kim Hakbeom Jang Tae Jun Ham and Jae W Lee. 2021. FlashNeuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks.. In FAST. 387–401.
3. Large-Scale Machine Learning with Stochastic Gradient Descent
4. Tom Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , 2020. Language models are few-shot learners. Advances in neural information processing systems 33 ( 2020 ), 1877–1901. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
5. Yu Cao , Wei Bi , Meng Fang , and Dacheng Tao . 2020. Pretrained language models for dialogue generation with multiple input sources. arXiv preprint arXiv:2010.07576 ( 2020 ). Yu Cao, Wei Bi, Meng Fang, and Dacheng Tao. 2020. Pretrained language models for dialogue generation with multiple input sources. arXiv preprint arXiv:2010.07576 (2020).