MLSecOps Community
+00:00 GMT
Sign in or Join the community to continue

Trojan Model Hubs: Hacking the ML Supply Chain and Defending Yourself from Threats

Posted Oct 02, 2024 | Views 142
# MLSecOps
# Adversarial ML
# AI Security
# Data Poisoning
# Model Security
# Supply Chain Vulnerability
# Threat Research
Share
speakers
avatar
Sam Washko
Senior Software Engineer @ Protect AI

Sam Washko is a senior software engineer passionate about the intersection of security and software development. She works for Protect AI developing tools for making machine learning systems more secure and is the lead engineer on ModelScan, an open source tool for scanning ML model files for attacks. She holds a BS in Computer Science from Duke University, and prior to joining Protect AI, she was part of the Azure SQL Security Feature Team and Blue Team at Microsoft, designing cryptography and telemetry systems. She has a passion for connecting theory and problem solving with engineering to produce solutions that make computing more secure for everyone.

+ Read More
avatar
William Armiros
Senior Software Engineer @ Protect AI

William is a Senior Software Engineer at Protect AI, where he is building systems to help ML engineers and data scientists introduce security into their MLOps workflows effortlessly. Previously, he led a team at Amazon Web Services (AWS) working on application observability and distributed tracing. During that time he contributed to the industry-wide OpenTelemetry standard and helped lead the effort to release an AWS-supported distribution of it. He is passionate about making the observability and security of AI-enabled applications as seamless as possible.

+ Read More
SUMMARY

In the fast-moving world of Artificial Intelligence (AI) and Machine Learning (ML), ensuring model and data integrity is a must. Sam Washko and Will Armiros (Sr. Software Engineers, Protect AI) joined our MLSecOps Community Meetup on September 10, 2024 to talk about ML supply chain vulnerabilities and defenses. Some of their key insights on model serialization attacks, data poisoning, and the bleeding-edge tools developed to keep your AI safe are included below.

+ Read More
TRANSCRIPT

Hacking Serialized Files

The old Trojan horse computer viruses that tried to sneak malicious code onto your system have evolved for the AI era. Machine learning models are often serialized to be saved or shared, converting them into a byte stream. However, this essential process can be a gateway for attackers. Hackers can inject rogue instructions directly into the serialized files used to deploy trained machine learning models. When this code is de-serialized, it can be executed in complete silence and can potentially harm the system without any damage to the model's performance. These threats very often go unnoticed by traditional security solutions because they effectively hide within the serialized code itself.

Popular model serialization formats like pickle, used by frameworks like PyTorch, Keras, and TensorFlow, are uniquely vulnerable to these attacks. Protect AI's research found 3,354 vulnerable files on Hugging Face, with 41% of them not being flagged by existing detection tools.

Data Poisoning AI Systems

However, the threats go beyond just tainted model files. The whole machine learning process is at risk from people who want to harm your AI. Data and model weight poisoning are among the significant threats. Data poisoning involves contaminating datasets to alter a model’s behavior. For example, an attacker could subtly alter training images so that a stop sign is misclassified as a yield sign. If this poisoned model is deployed in an autonomous vehicle, the consequences could be catastrophic.

You can no longer treat AI systems as opaque black boxes. You need full transparency into the provenance and integrity of every component - code, data, and model - across the entire machine learning supply chain. Enter the AI Bill of Materials (AI-BOM), an evolution of the Software Bill of Materials (SBOM), which details all components of a software application. AI-BOMs capture the precise versions and provenance of datasets, pre-trained models, and code dependencies that produce each unique AI model. With cryptographic attestations, you can validate authorship and detect any tampering.

Conclusion

Solutions like Protect AI’s Guardian offer advanced features to detect malicious operators in serialized files without deserialization, ensuring safety including real-time scanning of files from model hubs like Hugging Face, and blocking threats before they can harm your system. While creating an AI-BOM manually is daunting, Protect AI’s Radar automates this process, tracking datasets, training pipelines, and models to generate comprehensive AI-BOMs. This automation ensures traceability and security across the entire ML lifecycle.

In the rush to operationalize AI, we cannot let security fall by the wayside. From data poisoning to payload smuggling, the evolving threats are simply too grave. Make AI-BOMs and secure scanning fundamental to your machine learning practice before skilled adversaries smuggle disasters into your deployments.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLSecOps Community’s Code of Conduct and Privacy Policy.

Watch More

Navigating Vulnerabilities in the AI Supply Chain
Posted Jun 25, 2024 | Views 235
# Supply Chain Vulnerability
Securing AI/ML with Ian Swanson
Posted Jun 27, 2024 | Views 467
# AI Security
# AI Risk
# MLSecOps
# Model Scanning
# Model Provenance
# AI-SPM
# AI Agents
# AI/ML Red Teaming
# LLM
Exploring AI Cybersecurity and Regulation
Posted Jul 25, 2024 | Views 359
# AI Risk
# AI Security
# Cybersecurity
# Governance, Risk, & Compliance
# EU AI Act
# CA SB 1047