1. Peter Auer , Nicolo Cesa-Bianchi , and Paul Fischer . 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning , Vol. 47 , 2 ( 2002 ), 235--256. Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47, 2 (2002), 235--256.
2. 020)]% baillie2020cyborg, Callum Baillie , Maxwell Standen , Jonathon Schwartz , Michael Docking , David Bowman , and Junae Kim . 2020 . Cyborg: An autonomous cyber operations research gym. arXiv preprint arXiv:2002.10667 (2020). 020)]% baillie2020cyborg, Callum Baillie, Maxwell Standen, Jonathon Schwartz, Michael Docking, David Bowman, and Junae Kim. 2020. Cyborg: An autonomous cyber operations research gym. arXiv preprint arXiv:2002.10667 (2020).
3. Curtis Carver , JM Hill , John R Surdu , and Udo W Pooch . 2000 . A methodology for using intelligent agents to provide automated intrusion response . In Proceedings of the IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop , West Point, NY. 110--116. Curtis Carver, JM Hill, John R Surdu, and Udo W Pooch. 2000. A methodology for using intelligent agents to provide automated intrusion response. In Proceedings of the IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop, West Point, NY. 110--116.
4. Richard Elderman , Leon JJ Pater , Albert S Thie, Madalina M Drugan, and Marco A Wiering. 2017 . Adversarial Reinforcement Learning in a Cyber Security Simulation.. In ICAART ( 2). 559--566. Richard Elderman, Leon JJ Pater, Albert S Thie, Madalina M Drugan, and Marco A Wiering. 2017. Adversarial Reinforcement Learning in a Cyber Security Simulation.. In ICAART (2). 559--566.
5. Jerzy Filar and Koos Vrieze . 1997. Competitive Markov decision processes . Springer-Verlag . Jerzy Filar and Koos Vrieze. 1997. Competitive Markov decision processes. Springer-Verlag.