1. MIMIC-III, a freely accessible critical care database
2. Cross DQN: cross deep Q network for ads allocation in feed;Liao;CoRR,2021
3. Way off-policy batch deep reinforcement learning of implicit human preferences in dialog;Jaques;CoRR,2019
4. Off-policy deep reinforcement learning without exploration;Fujimoto
5. Stabilizing off-policy q-learning via bootstrapping error reduction;Kumar;Advances in Neural Information Processing Systems,2019