This is a recording of the session we held on October 17, 2024 with Johann Rehberger. During this session, Johann answered questions from the MLSecOps Community related to securing AI and machine learning (ML) systems, focusing on red teaming and attack strategies.
Explore with us:
- The big question we all want to know: how does Johann define the term “AI Red Teaming;” a term that's been highly debated within the industry recently?
- How can "traditional" red teamers & penetration testers adapt some of their current processes to the world of cybersecurity for AI/ML?
- How are LLMs uniquely challenging to red teamers, compared to conventional AI/ML models? Are there specific red teaming strategies you recommend for LLMs?
- Can you walk us through some of the more creative or less-discussed attack vectors that you've encountered while testing ML systems/LLMs?
- Do you have any predictions about how the threat of prompt injection will evolve as LLMs become more widely adopted?
- Since prompt filters don't work well on semantic attacks or epistemological attacks on models, What are ways to deal with these types of Zero Day threats?
- Have you seen Homoglyphs used in the wild or used Homoglyphs in your research to test limits?
- Have you noticed any advancements in adversarial attacks recently? If so, how can we better prepare for them?
- How would you (comparatively) view the frequency of tests against models as opposed to surrounding systems (for example RAG architectures, ...)?
- What are the most common vulnerabilities you find in AI and ML systems? (Hint: it's not what we might have assumed, audience!)
- In your experience, how frequently do attacks designed for one ML model successfully transfer to other models in the same environment? Any related precautions you’d recommend that organizations take?
- What kind of assessments have you already done?
- What monitoring strategy do you recommend?
- Is it possible to have a reliable real-time monitoring strategy at a reasonable cost?
- How do you carry out the evaluations for this?
- How do you feel about assessing AI risks (models and systems) with existing methods like CVSS?
- In security, firewalls have been known to have lots of false alarms, how do you see the AI guardrails/firewall working in the case of modern day agentic AI applications, where the real attacks are actually chained rather than a single-point prompt injection?
- What resources have you used to progress in this field/what resources would you recommend to the audience?
Plus, additional questions sprinkled throughout that came in from the live chat!
Session references & resources (including time stamp from video mention):
(00:31) Johann Rehberger's guest appearance on the MLSecOps Podcast - "Red Teaming, Threat Modeling, and Attack Methods of AI Apps"
(00:53) Johann's Embrace the Red blog "Machine Learning Attack Series"
(02:20) Andrew Ng's DeepLearning.ai courses
(07:10) The Johari Window Model
(18:39) Article re: Riley Goodside research - "Invisible text that AI chatbots understand and humans can’t? Yep, it’s a thing."
(31:33) ASCII smuggling technique, article - "Microsoft Fixes ASCII Smuggling Flaw That Enabled Data Theft from Microsoft 365 Copilot"
(36:35) NIST CVSS info
(42:12) Resources recommended by Johann: Andrew Ng’s Machine Learning Collection on Coursera, LearnPrompting.org, Embrace the Red