1. Adafactor: Adaptive learning rates with sublinear memory cost;shazeer;International Conference on Machine Learning,2018
2. Attention Is All You Need;vaswani;CoRR,2017
3. A Better and Faster end-to-end Model for Streaming ASR
4. Conformer: Convolution-augmented Transformer for Speech Recognition
5. An empirical investigation of catastrophic forgeting in gradient-based neural networks;bengio,2013