1. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015, Software available from tensorflow.org.
2. Learning long-term dependencies with gradient descent is difficult;Bengio;IEEE Trans. Neural Netw.,1994
3. Learning phrase representations using rnn encoder-decoder for statistical machine translation;Cho;Proc. 2014 Conf. Empir.Methods Natl. Lang. Process. (EMNLP),2014
4. F. Chollet, et al., Keras, 2015, (https://github.com/fchollet/keras).
5. Long short-term memory;Hochreiter;Neural Comput.,1997