1. Mandic DP, Chambers JA (2001) Recurrent Neural Networks for Prediction, Wiley Series in Adaptive and Learning Systems for Signal Processing, Communications, and Control, vol 4. John Wiley & Sons Ltd, Chichester, UK, p 297. https://doi.org/10.1002/047084535X
2. Medsker LR, Jain LC (2001) Recurrent Neural Networks: Design and Applications, 1st edn. CRC Press, Boca Raton
3. Pascanu R, Gulcehre C, Cho K, Bengio Y (2014) How to construct deep recurrent neural networks. In: Proceedings of the second international conference on learning representations (ICLR 2014)
4. Radford ISA, Wu J, Child R, Luan D, Amodei D (2020) Language models are unsupervised multitask learners. OpenAI Blog 1:1–7
5. Merity S, Keskar NS, Socher R (2018) Regularizing and optimizing LSTM language models. In: Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, Conference Track Proceedings. Retrieved from https://openreview.net/forum?id=SyyGPP0TZ