1. Shipra Agrawal and Randy Jia. 2017. Posterior sampling for reinforcement learning: worst-case regret bounds. arXiv preprint arXiv:1705.07041 (2017). Shipra Agrawal and Randy Jia. 2017. Posterior sampling for reinforcement learning: worst-case regret bounds. arXiv preprint arXiv:1705.07041 (2017).
2. Eitan Altman. 1999. Constrained Markov decision processes. Vol. Vol. 7. CRC Press. Eitan Altman. 1999. Constrained Markov decision processes. Vol. Vol. 7. CRC Press.
3. Dimitri P. Bertsekas. 1995. Dynamic programming and optimal control. Vol. Vol. 1. Athena scientific Belmont MA. Dimitri P. Bertsekas. 1995. Dynamic programming and optimal control. Vol. Vol. 1. Athena scientific Belmont MA.
4. Dimitri P. Bertsekas. 2009. Convex optimization theory. Athena Scientific Belmont. Dimitri P. Bertsekas. 2009. Convex optimization theory. Athena Scientific Belmont.
5. Craig Boutilier and Tyler Lu. 2016. Budget Allocation using Weakly Coupled Constrained Markov Decision Processes. UAI. Craig Boutilier and Tyler Lu. 2016. Budget Allocation using Weakly Coupled Constrained Markov Decision Processes. UAI.