1. Deep exploration via bootstrapped DQN;osband;Advances in neural information processing systems,2016
2. DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback;arakawa;ArXiv,2018
3. Brief Introduction of Back Propagation (BP) Neural Description of BP Algorithm in Mathematics;li;Adv Comput Inf Syst Sci Eng,2012
4. Learning from delayed rewards