1. HTLM: Hyper-text pre-training and prompting of language models;Aghajanyan Armen;arXiv:2107.06955,2021
2. Zeyuan Allen-Zhu and Yuanzhi Li. 2020. Towards understanding ensemble knowledge distillation and self-distillation in deep learning. arxiv:2012.09816. Retrieved from https://arxiv.org/abs/2012.09816.
3. Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. 2017. A closer look at memorization in deep networks. In Proceedings of the International Conference on Machine Learning. PMLR, 233–242.
4. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473.
5. PADA: Example-based Prompt Learning for on-the-fly Adaptation to
Unseen Domains