1. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Systems, Man, and Cybernetics, 13(5), 834–846.
2. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. CoRR abs/1606.01540.
arXiv:1606.01540
.
3. Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E., & Nowé, A. (2015). Reinforcement learning from demonstration through shaping. In IJCAI AAAI Press, pp. 3352–3358.
4. Dagan, I., & Engelson, S.P. (1995). Committee-based sampling for training probabilistic classifiers. In Machine learning, proceedings of the twelfth international conference on machine learning, Tahoe City, California, USA, July 9–12, 1995, pp. 150–157,
https://doi.org/10.1016/b978-1-55860-377-6.50027-x
.
5. Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., & Legg, S. (2017). Noisy networks for exploration. CoRR abs/1706.10295.
arXiv:1706.10295
.