1. Templates for Chat Models (2023). https://huggingface.co/docs/transformers/chat_templating
2. The Trojan Detection Challenge (LLM Edition) (2023). https://trojandetection.ai
3. Alon, G., Kamfonas, M.: Detecting language model attacks with perplexity (2023). arXiv:2308.14132
4. Anthropic: Claude 2. Anthropic (2023). https://www.anthropic.com/index/claude-2
5. Armstrong, S., Gorman, R.: Using GPT-Eliezer against ChatGPT Jailbreaking (2022). https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking