Generating fake security data with Python and faker-security
Michael Aquilina
26. April 2022
0 Min. LesezeitSnyk recently open sourced our faker-security Python package to help anyone working with security data. In this blog post, we’ll briefly go over what this Python package is and how to use it. But first, we’ll get some context for how the factory_boy
Python package can be used in combination with faker-security
to improve your test-writing experience during development.
Note: Some knowledge of Python is helpful for getting the most out of this post.
Testing with Faker and factory_boy
Before diving into faker-security
, it’s helpful to start with what factory_boy and Faker are and how we use them within Snyk.
Snyk believes strongly in the ability of automated tests to make our code maintainable. Tests allow us to iterate and develop features quickly, and confidently make changes to our code without fearing we may inadvertently break existing features in the process.
Our commitment to testing drives us to find new ways to simplify the testing experience for the test writers and readers within our teams. Faker
and factory_boy
are two of our favorite packages for testing Python projects. Together, they generate fake instances of models we use in testing.
Faker
is a Python package that allows you to generate fake data for many different kinds of fields, like usernames, dates, and URLs. factory_boy
is another Python package that helps integrate Faker
’s data generation into your code by defining factory classes.
What we love about factory_boy
, in particular, is that it allows a test author to focus on pinning the data they care about within their tests, while leaving Faker
to generate all the other data that the test does not care about. This greatly improves test readability by reducing the required lines of code and removing noise from fields you do not need to worry about.
To see the difference in action, compare a test that’s written with factory_boy
to one that isn’t in the following examples.
Without factories:
1from django.contrib.auth.models import User
2
3def test_correct_email_address():
4 user = User(
5 first_name="Sherlock",
6 last_name="Holmes",
7 username="sherlock.holmes",
8 email="sherlock.holmes@baker.street",
9 is_admin=False,
10 )
11
12 assert has_valid_email(user) is True
With factories:
1from tests.factories import UserFactory
2
3def test_correct_email_address():
4 user = UserFactory(email="sherlock.holmes@baker.street")
5 assert has_valid_email(user) is True
The test importing UserFactory
is exactly equivalent to the one which does not. However, it is shorter, easier to read, and clearly displays the fields that matter to the test. In comparison, the non-factory test is longer and makes it difficult to understand which fields actually matter for the purposes of the test. This is a fairly simple example, but the difference becomes even more pronounced as test complexity increases.
The UserFactory
class can be defined in tests/factories.py
once and re-used in all of your tests:
1import factory
2from django.contrib.auth.models import User
3from factory.django import DjangoModelFactory
4
5class UserFactory(DjangoModelFactory):
6 class Meta:
7 model = User
8
9 username = factory.Faker("slug")
10 first_name = factory.Faker("first_name")
11 last_name = factory.Faker("last_name")
12 email = factory.Faker("email")
13 is_admin = False
When dealing with security data, we often need to generate data for security fields like CVSSv3 vectors and CVE identifiers. Faker
does not have a direct way of providing this data by default, but it does allow you to add your own providers, which is exactly where faker-security
comes into play.
What is faker-security?
faker-security is a Python package that acts as a Faker
provider, allowing you to randomly generate security-related data for your projects. Currently, faker-security
supports data generation for:
CVSSv3 vectors
CVSSv2 vectors
semver versions
NPM semver version ranges
CVEs
CWEs
In the future, we hope to cover more generation methods and types of version ranges — like the Maven semver
.
Building on our previous examples, if we want to create a VulnerabilityFactory
of some kind to generate fake data, we would define it as follows:
1import factory
2from factory.django import DjangoModelFactory
3from faker_security.providers import SecurityProvider
4from myproject.models import Vulnerability
5
6factory.Faker.add_provider(SecurityProvider)
7
8class VulnerabilityFactory(DjangoModelFatory):
9 class Meta:
10 model = Vulnerability
11
12 cvss_v3_vector = factory.Faker("cvssv3")
13 cve_id = factory.Faker("cve")
14 cwe_id = factory.Faker("cwe")
How to use faker-security
faker-security
can be installed via pip
:
1pip install faker-security
If you want to use it within your project, add it to your dependency file of choice. This is typically your project’s requirements.txt
file. If you are using a higher-level package manager like poetry
or pipenv
, follow their instructions for adding new packages.
Once installed, you just need to configure Faker
or factory_boy
to make use of faker-security
.
If you are running tests with pytest, we recommend setting up faker-security
for factory_boy
in your conftest.py
file as follows:
1import factory
2from faker_security.providers import SecurityProvider
3
4def pytest_configure():
5 factory.Faker.add_provider(SecurityProvider)
Moving forward with faker-security
Using factory_boy
and Faker
is a great way to simplify your tests, and with faker-security, you now have a quick and easy way to generate fake security data for all your projects!
We hope you find this package as useful as we do and would love to have you contribute! Please star our GitHub repo and send pull requests and contributions. Happy testing!