Generating fake security data with Python and faker-security

Escrito por

Michael Aquilina

26 de abril de 2022

0 minutos de leitura

Snyk recently open sourced our faker-security Python package to help anyone working with security data. In this blog post, we’ll briefly go over what this Python package is and how to use it. But first, we’ll get some context for how the factory_boy Python package can be used in combination with faker-security to improve your test-writing experience during development.

Note: Some knowledge of Python is helpful for getting the most out of this post.

Testing with Faker and factory_boy

Before diving into faker-security, it’s helpful to start with what factory_boy and Faker are and how we use them within Snyk.

Snyk believes strongly in the ability of automated tests to make our code maintainable. Tests allow us to iterate and develop features quickly, and confidently make changes to our code without fearing we may inadvertently break existing features in the process.

Our commitment to testing drives us to find new ways to simplify the testing experience for the test writers and readers within our teams. Faker and factory_boy are two of our favorite packages for testing Python projects. Together, they generate fake instances of models we use in testing.

Faker is a Python package that allows you to generate fake data for many different kinds of fields, like usernames, dates, and URLs. factory_boy is another Python package that helps integrate Faker’s data generation into your code by defining factory classes.

What we love about factory_boy, in particular, is that it allows a test author to focus on pinning the data they care about within their tests, while leaving Faker to generate all the other data that the test does not care about. This greatly improves test readability by reducing the required lines of code and removing noise from fields you do not need to worry about.

To see the difference in action, compare a test that’s written with factory_boy to one that isn’t in the following examples.

Without factories:

1from django.contrib.auth.models import User
2
3def test_correct_email_address():
4    user = User(
5        first_name="Sherlock",
6        last_name="Holmes", 
7        username="sherlock.holmes",
8        email="sherlock.holmes@baker.street",
9        is_admin=False,
10    )
11
12    assert has_valid_email(user) is True

With factories:

1from tests.factories import UserFactory
2
3def test_correct_email_address():
4    user = UserFactory(email="sherlock.holmes@baker.street")
5    assert has_valid_email(user) is True

The test importing UserFactory is exactly equivalent to the one which does not. However, it is shorter, easier to read, and clearly displays the fields that matter to the test. In comparison, the non-factory test is longer and makes it difficult to understand which fields actually matter for the purposes of the test. This is a fairly simple example, but the difference becomes even more pronounced as test complexity increases.

The UserFactory class can be defined in tests/factories.py once and re-used in all of your tests:

1import factory
2from django.contrib.auth.models import User
3from factory.django import DjangoModelFactory
4
5class UserFactory(DjangoModelFactory):
6    class Meta:
7        model = User
8
9    username = factory.Faker("slug")
10    first_name = factory.Faker("first_name")
11    last_name = factory.Faker("last_name")
12    email = factory.Faker("email")
13    is_admin = False

When dealing with security data, we often need to generate data for security fields like CVSSv3 vectors and CVE identifiers. Fakerdoes not have a direct way of providing this data by default, but it does allow you to add your own providers, which is exactly where faker-security comes into play.

What is faker-security?

faker-security is a Python package that acts as a Faker provider, allowing you to randomly generate security-related data for your projects. Currently, faker-security supports data generation for:

CVSSv3 vectors
CVSSv2 vectors
semver versions
NPM semver version ranges
CVEs
CWEs

In the future, we hope to cover more generation methods and types of version ranges — like the Maven semver.

Building on our previous examples, if we want to create a VulnerabilityFactory of some kind to generate fake data, we would define it as follows:

1import factory
2from factory.django import DjangoModelFactory
3from faker_security.providers import SecurityProvider
4from myproject.models import Vulnerability
5
6factory.Faker.add_provider(SecurityProvider)
7
8class VulnerabilityFactory(DjangoModelFatory):
9    class Meta:
10        model = Vulnerability
11
12    cvss_v3_vector = factory.Faker("cvssv3")
13    cve_id = factory.Faker("cve")
14    cwe_id = factory.Faker("cwe")

How to use faker-security

faker-security can be installed via pip:

1pip install faker-security

If you want to use it within your project, add it to your dependency file of choice. This is typically your project’s requirements.txt file. If you are using a higher-level package manager like poetry or pipenv, follow their instructions for adding new packages.

Once installed, you just need to configure Faker or factory_boy to make use of faker-security.

If you are running tests with pytest, we recommend setting up faker-security for factory_boy in your conftest.py file as follows:

1import factory
2from faker_security.providers import SecurityProvider
3
4def pytest_configure():
5    factory.Faker.add_provider(SecurityProvider)

Moving forward with faker-security

Using factory_boy and Faker is a great way to simplify your tests, and with faker-security, you now have a quick and easy way to generate fake security data for all your projects!

We hope you find this package as useful as we do and would love to have you contribute! Please star our GitHub repo and send pull requests and contributions. Happy testing!

A plataforma de segurança para desenvolvedores