1. Ali, F. (2023, April 11). GPT-1 to GPT-4: Each of OpenAI’s GPT models explained and compared. MakeUseOf. Retrieved August 25, 2023, from https://www.makeuseof.com/gpt-models-explained-and-compared/
2. Antoniadis, P. (2023, March 16). Activation functions: Sigmoid vs Tanh. Baeldung. Retrieved August 25, 2023, from https://www.baeldung.com/cs/sigmoid-vs-tanh-functions
3. Bhat, R. (2022). Gradient descent with momentum. The problem with vanilla gradient… | by Rauf Bhat. Towards Data Science. Retrieved August 25, 2023, from https://towardsdatascience.com/gradient-descent-with-momentum-59420f626c8f
4. Brownlee, J. (2016, March 23). Gradient descent for machine learning - MachineLearningMastery.com. Machine Learning Mastery. Retrieved August 25, 2023, from https://machinelearningmastery.com/gradient-descent-for-machine-learning/
5. Brownlee, J. (2017, May 24). A gentle introduction to long short-term memory networks by the experts - MachineLearningMastery.com. Machine Learning Mastery. Retrieved August 25, 2023, from https://machinelearningmastery.com/gentle-introduction-long-short-term-memory-networks-experts/