Python security best practices cheat sheet

Written by:

Frank Fischer

September 27, 2021

0 mins read

wordpress-sync/cheat-sheet-python-security-2021-pdf

In 2019, Snyk released its first Python cheat sheet. Since then, many aspects of Python security have changed. Using our learnings as a developer security company — as well as Python-specific best practices — we compiled this updated cheat sheet to make sure you keep your Python code secure. And before going any further, I need to give special thanks to Chibo and Daniel for their help with this cheat sheet!

Download the 2021 Python Security Best Practices Cheat Sheet

Here are the Python security tips we’ll explore:

Always sanitize external data
Scan your code
Be careful when downloading packages
Review your dependency licenses
Do not use the system standard version of Python
Use Python’s capability for virtual environments
Set DEBUG = False in production
Be careful with string formatting
(De)serialize very cautiously
Use Python type annotations

One quick note before we get started. It’s important to note that Snyk’s data about the Python ecosystem, as well as academic research, shows that Python is no more (or less) secure than other widely used languages. This cheat sheet is just specifically for our Pythonistas. We’d recommend checking out all of our other security cheat sheets to learn how to stay safe in other ecosystems.

1. Always sanitize external data

One vector of attack for any application is external data, which can be used for injection, XSS, or denial of service (DOS) attacks. A general rule for maintaining Python security is to always sanitize data (remove sensitive information) from external sources whether the data originates from a user input form, scraping a website, or a database request. Also, sanitize as soon as the data enters the application to prevent insecure handling. This reduces the risk that unsanitized sensitive data will be handled by your application accidentally.

Starting with sanitization, it always makes more sense to check for what the input should be than to try to handle the exceptions. We also recommend using well-maintained libraries for sanitization. Here are two:

schema is “a library for validating Python data structures, such as those obtained from config-files, forms, external services or command-line parsing, converted from JSON/YAML (or something else) to Python data-types.
bleach is “an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes.”

Major frameworks come with their own sanitation functions, like Flask’s flask.escape() or Django’s django.utils.html.escape(). The goal of any of these functions is to secure potentially malicious HTML input like:

1>>> import bleach
2>>> bleach.clean('an XSS <script>navigate(...)</script> example')
3'an XSS &lt;script&gt;navigate(...)&lt;/script&gt; example'

The limitation of this approach is that libraries are not good for everything. They are specialized in their domain. As a note, be sure to read to the end of this post to get more information about working with other data formats, such as XML, which can also contain malicious data.

Another often used option is to leave the rendering of HTML to templating engines such as Jinja. It provides lots of capabilities, and amongst them is auto-escaping to prevent XSS using MarkupSafe.

Another aspect of sanitization is preventing data from being used as a command. A typical example is an SQL injection. Instead of stitching strings and variables together to generate an SQL query, it is advisable to use named-parameters to tell the database what to treat as a command and what as data.

1# Instead of this …
2cursor.execute(f"SELECT admin FROM users WHERE username = '{username}'");
3# ...do this...
4cursor.execute("SELECT admin FROM users WHERE username = %(username)s", {'username': username});

Or even better, use Object-Relational Mapping (ORM), such as sqlalchemy, which would make the example query look like this:

1query = session.query(User).filter(User.name.like('%{username}'))

Here you get more readable code, as well as ORM optimizations like caching, plus more security and performance!

If you want to learn more, check out our SQL injection cheat sheet.

2. Scan your code

Developers have a wide array of static code analysis tools at their disposal for maintaining Python security. Let’s take a look at three different levels of tools.

First, the linter level. PEP8 has been serving for decades now as a style guide for Python. Various tools are available (and built into IDEs) to check against this style guide, like pep8, pylint, flake8, and more.

Next, tools like bandit transform code into an abstract syntax tree (AST) and perform queries on it to find typical security issues. This is a level above what typical linters do, which work on a syntactical level. Still, bandit is limited by its intermediate representation and performance. For example, bandit cannot detect data flow related issues (known as taint-analysis) and these result in devastating flaws (injections like SQL injection or XSS as an example).

Finally, Static Application Security Testing (SAST) tools like Snyk Code run a semantic analysis, taking even complex interfile issues into account. Unlike other tools on this level, Snyk Code is developer-friendly by scanning fast and integrating into the IDE (or your command line directly). Snyk Code explains its highly accurate findings and provides help, including examples how to fix your Python security problems. And to top that, it’s easy to get started with and free to use on open source (plus a limited amount of non-OSS tests).

3. Be careful when downloading packages

It is easy to install packages, but they’re also an easy way to introduce Python security vulnerabilities. Typically, developers use the standard package installer for Python (pip) which uses the Python Pack Index (PyPI). This makes it important to understand how packages are added to PyPI.

PyPI has a procedure for reporting security concerns. If someone reports a malicious package, or a problem within PyPI, it is addressed, but packages added to PyPI do not undergo review — this would be an unrealistic expectation of the volunteers who maintain PyPI.

Therefore, it is wise to assume that there are malicious packages within PyPI and you should act accordingly. Reasonable steps include doing a bit of research on the package you want to install and ensuring that you carefully spell out the package name (a package named for a common misspelling of a popular package could execute malicious code). Before downloading a package, make sure to check it on Snyk Advisor.

wordpress-sync/blog-python-cheat-sheet-advisor

Doing a quick search for a package on Snyk Advisor gives you a lot of information on the package, its support in the community, its history of bugs and fixes, and a lot more. Snyk Advisor also provides the installation command at the top of the result page. It is a best practice to copy and paste that spelling to prevent typosquatting. Snyk Advisor can tell you whether or not you should trust a package. You can see the history of security issues and the time it took to get them fixed.

Another best practice is to use virtual environments to isolate projects from each other. Also, use pip freeze or a comparable command to record changes in the environment in the requirement list.

Maintaining references in an up-to-date manner, Snyk Open Source is based on an industry-leading vulnerability database recording security issues and possible fixes. Snyk Open Source runs scans using the requirements and provides actionable information about discovered vulnerabilities of direct and transitive dependencies and helps you to fix them right away.

4. Review your dependency licenses

When considering using an open source project, it is important to understand how these projects are licensed. Open source projects are free and available to use, but there may still be terms and conditions applied. These terms usually involve how the software is used, whether you need to make any changes you make to the software publicly available, and other similar requirements. You should become familiar with the open source licenses necessary for the projects you use, so you are sure that you are not compromising yourself legally.

If the project adopts a more restrictive license than you anticipated (GPL, SSPL, etc.), you can end up cornering yourself, leaving you to either comply with the terms of the license or cease using the project. Additionally, if you need to make changes to a project that does not have a license, you might run afoul with copyright law.

To ensure that your project is sustainable and you do not expose yourself to unnecessary Python security and legal risks, scan and fix license and vulnerability issues in your project’s dependencies.

Snyk Open Source can help you with open source license compliance management. It provides a developer-friendly way to gain end-to-end visibility while providing a flexible governance.

5. Do not use the system standard version of Python

Most POSIX systems come preloaded with a version of Python. The problem with most built-in Python distributions is that they aren’t current.

So, make sure to use the latest version of Python available for your system as well as the official containers designed to run Python and keep it updated. Snyk is here to help you. Scan your containers for necessary updates using Snyk Container and check your dependencies using Snyk Open Source.

6. Use Python virtual environments

Python is equipped to separate application development into virtual environments. A virtual environment isolates the Python interpreter, libraries, and scripts installed into it. This means that instead of using a global Python version and global Python dependencies for all your projects, you can have project-specific virtual environments that can use their own Python (and Python dependency) versions!

Most IDEs, CLIs, and dashboards such as Anaconda Navigator have built-in functions to switch between virtual environments.

Pro Tip: As of Python version 3.5, the use of venv is recommended and with version 3.6 pyvenv was deprecated.

Virtual environments make developing, packaging, and shipping secure Python applications easier. Using them is highly recommended. See the Python venv doc for more details.

7. Set `DEBUG = False` in production

In a development environment, it makes sense to have verbose error messages. In production though, you want to prevent any leaks of information that might help an attacker to learn more about your environment, libraries, or code.

By default, most frameworks have debugging switched on. For example, Django has it enabled in settings.py. Make sure to switch debugging to False in production to prevent leaking sensitive application information to attackers.

Pro Tip: When deploying to production, it is useful to have your continuous deployment system verify this setting is disabled post-deployment.

8. Be careful with string formatting

Despite Python’s idea of having one — and only one — way to do things, it actually has four different ways to format strings (three methods for versions prior to Python 3.6).

String formatting has gotten progressively more flexible and powerful (f-strings are particularly interesting), but as flexibility increases, so does the potential for exploits. For this reason, Python users should carefully consider how they format strings with user-supplied input.

Python has a built-in module named string. This module includes the Template class, which is used to create template strings.

Consider the following example.

1from string import Template
2greeting_template = Template(“Hello World, my name is $name.”)
3greeting = greeting_template.substitute(name=”Hayley”)

For the above code, the variable greeting is evaluated as: “Hello World, my name is Hayley.”

This string format is a bit cumbersome because it requires an import statement and is less flexible with types. It also doesn’t evaluate Python statements the way f-strings do. These constraints make template strings an excellent choice when dealing with user input.

Another quick note about string formatting: Be extra careful with raw SQL as mentioned above.

9. (De)serialize very cautiously

Python provides a built-in mechanism to serialize and deserialize Python objects called “pickling” using the pickle module. This is known to be insecure and it is advisable to use it very cautiously and only on trusted data sources.

The new de facto standard for serialization/deserialization is YAML. The PyYAMLpackage provides a mechanism to serialize custom data types to YAML and back again. But PyYAML is riddled with various possible attack vectors. A simple but effective way to secure the usage of PyYAML is using yaml.SafeLoader() instead of yaml.Loader() as a loader.

1Data = yaml.load(input_file, Loader=yaml.SafeLoader)

This prevents loading of custom classes but supports standard types like hashes and arrays.

Another typical use case is XML. Standard libraries are often used but are vulnerable to typical attacks — namely DOS attacks or external entity expansion (an external source is references). A good first line of defense is a package called defusedxml. It has safeguards against these typical XML security issues.

Bonus, non-security tip: Use Python type annotations

With version 3.5, type hints were introduced. While the Python runtime does not enforce type annotations, tools such as type checkers, IDEs, linters, SASTs, and others can benefit from the developer being more explicit. Here is an example to highlight the idea:

1MODE = Literal['r', 'rb', 'w', 'wb']
2def open_helper(file: str, mode: MODE) -> str:
3    ...
4open_helper('/some/path', 'r')  # Passes type check
5open_helper('/other/path', 'typo')  # Error in type checker

Literal[...] was introduced with version 3.8 and is not enforced by the runtime (you can pass whatever string you want in our example) but type checkers can now discover that the parameter is outside the allowed set and warn you. This is a great piece of functionality, and not just for Python security.

Note: As it is not enforced by the runtime, the security usage of type hints is limited.

Live Hack: Exploiting AI-Generated Code

Gain insights into best practices for utilizing generative AI coding tools securely in our upcoming live hacking session.