Python Security Best Practices Cheat Sheet
In this installment of our cheat sheet series, we’re going to cover the best practices for securely using Python.
1. Use Python 3
What version of Python are you using?
Although Python 3 has been out for more than decade, many people and companies are still running Python 2.7 in production. As of the time of this writing, Python 2.7 is still officially supported. The Python Software Foundation has announced that support for Python 2 ends January 1, 2020. If you have not upgraded by then, you leave yourself open to security vulnerabilities, both within the language and within other open source projects that are unlikely to maintain compatibility with Python 2.7.
For instance, Django 1.11 is the last version of Django that is compatible with Python 2.7. Long term support for Django 1.11 is promised through at least April 2020, but it would not be wise to rely on security support beyond that point.
The transition to Python 3 has not been easy for the community. The breaking changes introduced in Python 3 mean that a software developer needs to be sure that their legacy codebases are ready for upgrade, and also that all of their open source dependencies are compatible with Python 3.
If you are still running Python 2.7, now is the time to prioritize your technical debt and upgrade.
2. Scan your code with Bandit
A simple way to find security vulnerabilities within your Python code is to run a scan with Bandit.
Bandit is an open source project that is available through the Python Packaging Index (PyPI). Bandit scans each
.py file and builds a corresponding abstract syntax tree (AST). Bandit then runs a number of plugins against the AST to find common software security problems. For example, one plugin can detect whether you are using Flask (a micro-framework for Python) with the debug setting equal to True.
Bandit works either as a local tool to be used as you develop, or as part of your CI/CD (continuous integration/ continuous delivery) pipeline. You can create a YAML configuration file to control how Bandit behaves in these different scenarios. In this file you can also indicate a list of tests to skip. This functionality should be used with caution.
There is no guarantee that Bandit will catch all security problems—there are a finite number of plugins that it runs, and you could potentially have an issue in your code that doesn’t register against any of the available plugins. However, it is easy to use and an excellent screen for common issues.
3. Use Pipenv for environment and dependency management
No one likes it when something surprising happens in production. Ideally the developer’s local environment should be identical to the production environment in order to eliminate surprises. It can therefore be tempting to run pip freeze on your local machine, dump the resulting list of packages and versions into a requirements.txt, and then use that file to set up your production environment. This is easy to do, but not the most security conscious option.
The process described above results in “pinning” your dependencies. When you pin your dependencies, you freeze your project to a moment in time. This is great for predictability, but leaves your project exposed as new security vulnerabilities are found and remediated for those open source dependencies.
Pipenv is a tool that manages the competing interests of having a predictable environment and having an up-to-date environment. It uses a two-file system that separates abstract dependency declarations from the last tested combination. Pipenv manages your installations and your virtual environment, displays your dependency tree, and can check your dependencies for known vulnerabilities.
4. Watch your import statements
Python imports are very flexible, but that flexibility has a security cost.
When importing in Python, you can use an absolute import or a relative import. An absolute import uses the entire path (starting at the root directory) of the module that you want to import. If the module you want to import is not found at that location, an error occurs. Absolute imports are a good way to know exactly what you are importing.
A relative import starts at the path of the current module. There are two types of relative imports, explicit and implicit. Explicit relative imports specify the precise location of the module you want to import with respect to the current module. For example, you might have an import statement that looks like this `from .. import my_module`. The dots indicate how many directories to traverse upwards.
An implicit relative import does not specify a location relative to the current module. If the module is found in the system path it is imported, which could be dangerous. It could be possible to create a malicious module with the same name as a popular module and then smuggle it into a popular open source library. If the malicious module is found in the system path before the real module it is imported instead.
Import statements in Python execute the code in the imported module—this means that an implicit relative import could result in the execution of malicious code. For this reason, implicit relative imports are not supported in Python 3.
If you are using Python 2, eliminate the use of implicit relative imports. This is important for the current security of your project and because it is a necessary step towards upgrading to Python 3. If you are using Python 3, it is still important to keep in mind that import statements execute the code within the target module. Because of this, it makes sense to be careful with your import statements, regardless of the Python version that you are using.
5. Be careful when downloading packages
It is easy to install Python packages. Typically developers use the standard package installer for Python (pip), although Pipenv as discussed above is a great alternative. Regardless of whether you use pip or Pipenv, it is important to understand how packages are added to PyPI.
PyPI has a procedure for reporting security concerns. If someone reports a malicious package or a problem within PyPI it is addressed, but packages added to PyPI do not undergo review—this would be an unrealistic expectation of the volunteers who maintain PyPI.
Therefore it is wise to assume that there are malicious packages within PyPI and behave accordingly. Reasonable steps include doing a bit of research on the package you want to install and ensuring that you carefully spell out the package name (a package named for a common misspelling of a popular package could execute malicious code).
6. Handle requests safely
HTTP requests are typically handled in Python through the requests library. It is important to understand how this library handles certain security issues so you can be sure that you are getting the full security benefit from the requests module.
The requests library handles SSL certificate verification for you! This library uses a package called certifi to validate the trustworthiness of certificate authorities. For this reason, keep your connections secure by maintaining the most updated version of certifi. (Hint: Do not pin this dependency!)
When making a request, it is possible to bypass the SSL certificate verification. The default however is to check, and you should only break this rule if you trust the source.
The syntax to bypass a certificate verification request looks like this:
7. Be careful with string formatting
Despite Python’s ideal of having one and only one way to do things, it actually has four different ways to format strings (three methods for versions prior to Python 3.6).
String formatting has gotten progressively more flexible and powerful (f-strings are particularly interesting) but as flexibility increases, so does the potential for exploits. For this reason, Python users should carefully consider how they format strings with data entered by users.
Python has a built-in module named string. This module includes the Template class, which is used to create template strings.
Consider the following example.
from string import Template
greeting_template = Template(“Hello World, my name is $name.”)
greeting = greeting_template.substitute(name=”Hayley”)
For the above code, the variable greeting is evaluated as
“Hello World, my name is Hayley.”
This string format is a bit cumbersome because it requires an import statement and it is less flexible with types. It also doesn’t evaluate Python statements the way f-strings do. These constraints make template strings an excellent choice when dealing with user input.
Another quick note on string formatting: Be extra careful with raw SQL. Make your queries with object-relational mapping (ORM) if at all possible.
8. Review your dependency licenses
When considering using an open source project, it is important to understand how these projects are licensed. Open source projects are free and available to use, but there may still be terms and conditions applied. These terms usually involve how the software is used, whether you need to make any changes you make to the software publicly available, and other similar requirements. You should become familiar with the licenses necessary for the projects you use, so you are sure that you are not compromising yourself legally.
Snyk recently reviewed the project licenses available through PyPI. More than 10% of the available packages do not specify a license. These packages may have the appearance of open source software (they are freely available), but they are not explicitly open source.
If the project adopts a more restrictive license than you anticipated, you have essentially cornered yourself. You should comply to the terms of the license or cease using the project. Additionally, if you need to make changes to the project that does not have a license, you might run afoul with copyright law.
To ensure that your project is sustainable and to protect yourself, know what licenses your dependencies use and comply with the terms.
9. Deserialize selectively
Let’s consider Python’s pickle module—part of its standard library. Pickle allows you to serialize and deserialize a Python object structure. If you receive a pickled python object structure from an untrusted source, deserializing that object can result in malicious code execution. There is no way to know whether the object structure that you are “unpickling” is malicious until it is too late. This behavior was recently found in NumPy–a popular package for scientific computing.
This kind of problem is not isolated to the Python’s pickle module. When using PyYAML–a YAML parser and emitter for Python–be aware that it has a similar vulnerability. If you use yaml.load on a YAML file from an untrusted source, it could execute Python code. PyYAML gives users another option in yaml.safe_load, which should be your default.
Do not deserialize data from an untrusted source.
10. Keep up-to-date on vulnerabilities
The sooner you know about a vulnerability in an open source dependency, the sooner you can remediate the problem within your code. Sometimes this remediation means upgrading to a newer version of your open source dependency, sometimes it means a patch, and sometimes means incorporating changes within your code to help ensure you are properly sanitizing your input and that you are avoiding vulnerable functions.
You can read more about a recent example of this within the Python ecosystem here.
There are a number of options to keep yourself up-to-date—Snyk can help you do this, but it is not the only option. Pipenv has a vulnerability checker that utilizes the pyup.io API, for example.
Snyk is committed to helping developers understand security and bring it into their development life-cycle. Our tools help you find and fix vulnerabilities within your open source dependencies, and notify you when new vulnerabilities are found. Follow Snyk on twitter to see other articles like this and consider trying our tools for yourself. It is free to try and free for open source!