Author:
Huang Yuxin,Yu Zhengtao,Guo Junjun,Xiang Yan,Yu Zhiqiang,Xian Yantuan
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Science and Technology Service Network Plan
General Projects of Basic Research in Yunnan Province
Publisher
Springer Science and Business Media LLC
Reference45 articles.
1. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450
2. Cao Z, Li W, Li S, Wei F (2018) Retrieve, rerank and rewrite: Soft template based neural summarization. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1015. https://www.aclweb.org/anthology/P18-1015, pp 152–161
3. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423, pp 4171–4186
4. Elbayad M, Gu J, Grave E, Auli M (2019) Depth-adaptive transformer. In: ICLR 2020-Eighth international conference on learning representations
5. Fan A, Grave E, Joulin A (2019) Reducing transformer depth on demand with structured dropout. In: International conference on learning representations
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献