1. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms, [Online], Available: https://arxiv.org/abs/1707.06347, 2017.
2. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, S. Levine. Soft atoor-rritic aloorithms nnd pppliaatoons, [Online], Available: https://arxiv.org/abs/1812.05905, 2018.
3. V. R Konda, J. N. Tsitsiklis. Actor-critic algorithms. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, USA, pp. 1008–1014, 1999.
4. Q. L. Dang, W. Xu, Y. F. Yuan. A dynamic resource allocation strategy with reinforcement learning for multimodal multi-objective optimization. Machine Intelligence Research, vol. 19, no. 2, pp. 138–152, 2022. DOI: https://doi.org/10.1007/s11633-022-1314-7.
5. K. Kase, C. Paxton, H. Mazhar, T. Ogata, D. Fox. Transferable task execution from pixels through deep planning domain learning. In Proceedings of IEEE International Conference on Robotics and Automation, Paris, France, pp. 10459–10465, 2020. DOI: https://doi.org/10.1109/ICRA40945.2020.9196597.