1. Bengio, Y., Simard, P., and Frasconi, P., 1994, Learning long-term dependencies with gradient descent is diffi cult: IEEE Transactions on Neural Networks, 5(2), 157–166.
2. Chung, J., Gulcehre, C., Cho, K., and Bengio, Y., 2014, Empirical evaluation ofgated recurrent neural networks on sequence modeling.
3. Dey, R., and Salemt, F.M., 2017, Gate-variants of gated recurrent unit (GRU) neural networks: IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), 1597–1600.
4. Frinken, V., and Uchida, S., 2015, Deep BLSTM neural networks for unconstrained continuous handwritten text recognition: 3th International Conference on Document Analysis and Recognition (ICDAR), IEEE, 911–915.
5. Gers, F., Schmidhuber, J., and Cummins, F., 2000, Learning to forget: continual prediction with LSTM: Neural Computation, 12(10), 2451–2471.