Sign in or Join the community to continue

Trojan Model Hubs: Hacking the ML Supply Chain and Defending Yourself from Threats

Posted Oct 02, 2024 | Views 418

# MLSecOps

# Adversarial ML

# AI Security

# Data Poisoning

# Model Security

# Supply Chain Vulnerability

# Threat Research

Share

speakers

Sam Washko

Senior Software Engineer @ Protect AI

Sam Washko is a senior software engineer passionate about the intersection of security and software development. She works for Protect AI developing tools for making machine learning systems more secure and is the lead engineer on ModelScan, an open source tool for scanning ML model files for attacks. She holds a BS in Computer Science from Duke University, and prior to joining Protect AI, she was part of the Azure SQL Security Feature Team and Blue Team at Microsoft, designing cryptography and telemetry systems. She has a passion for connecting theory and problem solving with engineering to produce solutions that make computing more secure for everyone.

+ Read More

William Armiros

Senior Software Engineer @ Protect AI

William is a Senior Software Engineer at Protect AI, where he is building systems to help ML engineers and data scientists introduce security into their MLOps workflows effortlessly. Previously, he led a team at Amazon Web Services (AWS) working on application observability and distributed tracing. During that time he contributed to the industry-wide OpenTelemetry standard and helped lead the effort to release an AWS-supported distribution of it. He is passionate about making the observability and security of AI-enabled applications as seamless as possible.

+ Read More

SUMMARY

In the fast-moving world of Artificial Intelligence (AI) and Machine Learning (ML), ensuring model and data integrity is a must. Sam Washko and Will Armiros (Sr. Software Engineers, Protect AI) joined our MLSecOps Community Meetup on September 10, 2024 to talk about ML supply chain vulnerabilities and defenses. Some of their key insights on model serialization attacks, data poisoning, and the bleeding-edge tools developed to keep your AI safe are included below.

+ Read More

TRANSCRIPT

Hacking Serialized Files

The old Trojan horse computer viruses that tried to sneak malicious code onto your system have evolved for the AI era. Machine learning models are often serialized to be saved or shared, converting them into a byte stream. However, this essential process can be a gateway for attackers. Hackers can inject rogue instructions directly into the serialized files used to deploy trained machine learning models. When this code is de-serialized, it can be executed in complete silence and can potentially harm the system without any damage to the model's performance. These threats very often go unnoticed by traditional security solutions because they effectively hide within the serialized code itself.

Popular model serialization formats like pickle, used by frameworks like PyTorch, Keras, and TensorFlow, are uniquely vulnerable to these attacks. Protect AI's research found 3,354 vulnerable files on Hugging Face, with 41% of them not being flagged by existing detection tools.

Data Poisoning AI Systems

However, the threats go beyond just tainted model files. The whole machine learning process is at risk from people who want to harm your AI. Data and model weight poisoning are among the significant threats. Data poisoning involves contaminating datasets to alter a model’s behavior. For example, an attacker could subtly alter training images so that a stop sign is misclassified as a yield sign. If this poisoned model is deployed in an autonomous vehicle, the consequences could be catastrophic.

You can no longer treat AI systems as opaque black boxes. You need full transparency into the provenance and integrity of every component - code, data, and model - across the entire machine learning supply chain. Enter the AI Bill of Materials (AI-BOM), an evolution of the Software Bill of Materials (SBOM), which details all components of a software application. AI-BOMs capture the precise versions and provenance of datasets, pre-trained models, and code dependencies that produce each unique AI model. With cryptographic attestations, you can validate authorship and detect any tampering.

Conclusion

Solutions like Protect AI’s Guardian offer advanced features to detect malicious operators in serialized files without deserialization, ensuring safety including real-time scanning of files from model hubs like Hugging Face, and blocking threats before they can harm your system. While creating an AI-BOM manually is daunting, Protect AI’s Radar automates this process, tracking datasets, training pipelines, and models to generate comprehensive AI-BOMs. This automation ensures traceability and security across the entire ML lifecycle.

In the rush to operationalize AI, we cannot let security fall by the wayside. From data poisoning to payload smuggling, the evolving threats are simply too grave. Make AI-BOMs and secure scanning fundamental to your machine learning practice before skilled adversaries smuggle disasters into your deployments.

+ Read More

Sign in or Join the community

Like

Comments (0)

Popular

Watch More

Navigating Vulnerabilities in the AI Supply Chain

Posted Jun 25, 2024 | Views 319

# Supply Chain Vulnerability

Securing AI: Red Teaming & Attack Strategies for Machine Learning Systems

Posted Nov 01, 2024 | Views 811

# AI Security

# AI/ML Red Teaming

# Ethical Hacking

# Pen Testing

# Prompt Injection

# Threat Research

Securing AI/ML with Ian Swanson

Posted Jun 27, 2024 | Views 696

# AI Security

# AI Risk

# MLSecOps

# Model Scanning

# Model Provenance

# AI-SPM

# AI Agents

# AI/ML Red Teaming

# LLM