Q: Currently, there are many interpretations of the term “AI red teaming.” How do you define it?
A: I started Microsoft’s AI Red Team in 2018 with the goal of thinking like an attacker to identify risks, uncover blind spots, validate assumptions, and probe AI systems for failures. Microsoft’s AI Red Team leverages a dedicated interdisciplinary group of security, adversarial machine learning, and responsible AI experts. We also employ resources from the entire Microsoft ecosystem, including the Fairness center in Microsoft Research; AETHER, Microsoft’s cross-company initiative on AI Ethics and Effects in Engineering and Research; and the Office of Responsible AI.
AI red teaming generally takes place at two levels: the base mode level (e.g. GPT-4) and the application level (e.g. Security Copilot, which uses GPT-4 in the back end). Both levels bring their own advantages: for instance, red teaming the model helps to identify early in the process how models can be misused, to scope capabilities of the model, and to understand the model’s limitations.
Q: What are the key differences between “traditional” red teaming and AI red teaming? Could you explain how the approach to testing AI systems for vulnerabilities differs from conventional security testing, particularly in terms of the risks unique to AI?
A: While AI red teaming encompasses many of the traditional hacking techniques and software attack vectors, like serialization threats, overly broad permissions, and weak encryption, it also addresses unique threats to AI systems. These unique risks include things like prompt injection, model poisoning, and responsible AI risks (RAI) – such as fairness issues, plagiarism, and harmful content. Both pillars can vary widely, and teams need to explore those two spaces simultaneously. Additionally, our job is more probabilistic than traditional red teaming. While executing an attack path on traditional software systems will most always give you the same result, AI systems can, and do, provide different outputs for the same input.
Q: Does AI red teaming help address adversarial machine learning threats? How? Can you provide examples of common adversarial tactics used against AI models and how red teaming identifies and mitigates these attacks?
A: In February 2024, Microsoft published research in collaboration with OpenAI on emerging threats in the age of AI. Our analysis of the current use of LLM technology by threat actors revealed behaviors consistent with attackers using AI as another productivity tool on the offensive landscape, including prompt-injections, attempted misuse of large language models (LLM), and fraud. Microsoft and OpenAI have not yet observed particularly novel or unique AI-enabled attack or abuse techniques resulting from threat actors’ usage of AI, but we will continue studying this landscape closely.
To address this new category of threats, Microsoft has announced a set of principles, policies, and actions that aim to mitigate the risks associated with the use of our AI tools and APIs by nation-state advanced persistent threats (APTs), advanced persistent manipulators (APMs), and cybercriminal syndicates we track.
These principles include:
- Identification and action against malicious threat actors’ use: Upon detection of the use of any Microsoft AI technology by an identified malicious threat actor, Microsoft will take appropriate action to disrupt their activities.
- Notification to other AI service providers: When we detect a threat actor’s use of another service provider’s AI, Microsoft will promptly notify the service provider and share relevant data.
- Collaboration with other stakeholders: Microsoft will collaborate with other stakeholders to regularly exchange information about detected threat actors’ use of AI. This collaboration aims to promote collective, consistent, and effective responses to ecosystem-wide risks.
- Transparency: As part of our ongoing efforts to advance responsible use of AI, Microsoft will inform the public and stakeholders about actions taken under these threat actor principles, including the nature and extent of threat actors’ use of AI detected within our systems and the measures taken against them, as appropriate.
Q: What advice would you give to companies just starting to integrate AI red teaming into their security strategies?
It’s important for every security organization to implement AI Red Teaming practices. That’s why my team at Microsoft has released open-source frameworks such as Counterfit and the Python Risk Identification Toolkit for generative AI, or PyRIT, to help security professionals and machine learning engineers map potential risks to their AI systems.
Our team has published a guide to help others build AI Red Teams for LLMs, which includes steps like familiarizing yourself with the concept of AI Red Teaming, clearly establishing and defining objectives, assembling a diverse team, executing red teaming exercises and tests, analyzing results and more. Feel free to read more about the tools and steps we suggest implementing here.