1. Multi-agent actor-critic for mixed cooperative-competitive environments;lowe;Proc Neural Inf Process Syst (NIPS),2017
2. Reducing estimation bias via weighted delayed deep deterministic policy gradient;he;arXiv 2006 12622,2020
3. Neural machine translation by jointly learning to align and translate;bahdanau;Proc 3rd Int Conf Learn Represent (ICLR),2015
4. Device placement optimization with reinforcement learning;mirhoseini;Proc 34th Int Conf Mach Learn (PMLR),2017
5. Neural combinatorial optimization with reinforcement learning;bello;arXiv 1611 09940,2016