1. Soft actor-critic algorithms and applications;haarnoja;arXiv preprint arXiv 1812 09111,2018
2. Adam: a method for stochastic optimization;kingma;arXiv preprint arXiv 1412 6980,2014
3. Dropout: a simple way to prevent neural networks from overfitting;srivastava;The Journal of Machine Learning Research,2014
4. MountainCarContinuous-v0,0
5. Inverse Reinforcement Learning Approach for Elicitation of Preferences in Multi-objective Sequential Optimization