1. Andrychowicz M, Wolski F, Ray A, et al (2017) Hindsight experience replay. In: NeurIPS, pp 5048–5058
2. Ash JT, Adams RP (2020) On warm-starting neural network training. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, virtual
3. Battaglia PW, Hamrick JB, Bapst V et al (2018) Relational inductive biases, deep learning, and graph networks. CoRR abs/1806.01261
4. Brafman RI, Tennenholtz M (2002) R-max-a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3:213–231
5. Das S, Natarajan S, Roy K et al (2020) Fitted q-learning for relational domains. CoRR abs/2006.05595