Sign in or Join the community to continue

Securing AI: Red Teaming & Attack Strategies for Machine Learning Systems

Posted Nov 01, 2024 | Views 834

# AI Security

# AI/ML Red Teaming

# Ethical Hacking

# Pen Testing

# Prompt Injection

# Threat Research

Share

speaker

Johann Rehberger

Red Team Director | Hacker | Entrepreneur @ **

Johann has over eighteen years of experience in threat analysis, threat modeling, risk management, penetration testing, and red teaming. As part of his many years at Microsoft, Johann established an offensive security team in Azure Data and led the program as Principal Security Engineering Manager for years. He also built out a red team at Uber and currently works as the Director of Red team at Electronic Arts. He enjoys providing training and was an instructor for ethical hacking at the University of Washington. Johann contributed to the MITRE ATT&CK framework (Pass those Cookies!), published a book on how to build and manage internal red teams, enjoys hacking machine learning systems and holds a master’s in computer security from the University of Liverpool. For latest updates and information visit his blog at Embrace the Red.

+ Read More

SUMMARY

Welcome to "MLSecOps Connect: Ask the Experts," an educational live stream series from the MLSecOps Community where attendees have the opportunity to hear their own questions answered by a variety of insightful guest speakers.

+ Read More

TRANSCRIPT

This is a recording of the session we held on October 17, 2024 with Johann Rehberger. During this session, Johann answered questions from the MLSecOps Community related to securing AI and machine learning (ML) systems, focusing on red teaming and attack strategies.

Explore with us:

The big question we all want to know: how does Johann define the term “AI Red Teaming;” a term that's been highly debated within the industry recently?
How can "traditional" red teamers & penetration testers adapt some of their current processes to the world of cybersecurity for AI/ML?
How are LLMs uniquely challenging to red teamers, compared to conventional AI/ML models? Are there specific red teaming strategies you recommend for LLMs?
Can you walk us through some of the more creative or less-discussed attack vectors that you've encountered while testing ML systems/LLMs?
Do you have any predictions about how the threat of prompt injection will evolve as LLMs become more widely adopted?
Since prompt filters don't work well on semantic attacks or epistemological attacks on models, What are ways to deal with these types of Zero Day threats?
Have you seen Homoglyphs used in the wild or used Homoglyphs in your research to test limits?
Have you noticed any advancements in adversarial attacks recently? If so, how can we better prepare for them?
How would you (comparatively) view the frequency of tests against models as opposed to surrounding systems (for example RAG architectures, ...)?
What are the most common vulnerabilities you find in AI and ML systems? (Hint: it's not what we might have assumed, audience!)
In your experience, how frequently do attacks designed for one ML model successfully transfer to other models in the same environment? Any related precautions you’d recommend that organizations take?
What kind of assessments have you already done?
What monitoring strategy do you recommend?
Is it possible to have a reliable real-time monitoring strategy at a reasonable cost?
How do you carry out the evaluations for this?
How do you feel about assessing AI risks (models and systems) with existing methods like CVSS?
In security, firewalls have been known to have lots of false alarms, how do you see the AI guardrails/firewall working in the case of modern day agentic AI applications, where the real attacks are actually chained rather than a single-point prompt injection?
What resources have you used to progress in this field/what resources would you recommend to the audience?

Plus, additional questions sprinkled throughout that came in from the live chat!

Session references & resources (including time stamp from video mention):

(00:31) Johann Rehberger's guest appearance on the MLSecOps Podcast - "Red Teaming, Threat Modeling, and Attack Methods of AI Apps"

(00:53) Johann's Embrace the Red blog "Machine Learning Attack Series"

(02:20) Andrew Ng's DeepLearning.ai courses

(07:10) The Johari Window Model

(18:39) Article re: Riley Goodside research - "Invisible text that AI chatbots understand and humans can’t? Yep, it’s a thing."

(31:33) ASCII smuggling technique, article - "Microsoft Fixes ASCII Smuggling Flaw That Enabled Data Theft from Microsoft 365 Copilot"

(36:35) NIST CVSS info

(42:12) Resources recommended by Johann: Andrew Ng’s Machine Learning Collection on Coursera, LearnPrompting.org, Embrace the Red

+ Read More

Sign in or Join the community

Like

Comments (0)

Popular

Watch More

Behind the Scenes of AI Security: Red Teaming Strategies and Innovations

Posted Jan 14, 2025 | Views 559

Key Insights for CISOs: Securing AI in Your Organization

Posted Mar 17, 2025 | Views 192

# AI Risk

# AI Security

# Generative AI

# Threat Model

# MLSecOps

# Incident Response

# Governance, Risk, & Compliance

Securing AI/ML with Ian Swanson

Posted Jun 27, 2024 | Views 717

# AI Security

# AI Risk

# MLSecOps

# Model Scanning

# Model Provenance

# AI-SPM

# AI Agents

# AI/ML Red Teaming

# LLM