The ultimate guide to creating a secure Python package
May 8, 2024
0 mins readCreating a Python package involves several actions, such as figuring out an apt directory structure, creating package files, and configuring the package metadata before deploying it. There are a few other steps that you need to follow including creating a subdirectory for tests and clear documentation. Once the package is ready, you can distribute it to various distribution archives. With that, your Python package will be ready for others to install and use.
This guide will detail all the steps to build a modern Python package. You can find the code required to follow this tutorial in this GitHub repository. After implementing this tutorial, you'll realize that creating a Python package can be as simple as making use of an existing one, but first, let's start by learning how to use packages in Python.
How to create a Python package
Before you begin creating a Python package, you need to review the various methods of installation and importation, as well as the general structure of the packages.
Installing packages in Python
To use Python packages, you can utilize public-facing packages from PyPI or private indexes that contain proprietary code within an organization.
The following are two different ways to install a Python package:
Using PyPI
PyPI is the official repository that contains all public Python packages.
$ pip install package_name
Using a private index
In some scenarios, you may need to use publicly available packages from PyPI and privately available packages from a private index. Private indexes are typically used within large organizations for proprietary projects whose source code will not be shared publicly. These indexes allow you to store and distribute Python packages internally within your organization.
To install packages from a private index, you can use the pip install command with the -- index-url
flag, followed by the URL of the private index and the package name. Following is an example of a command you'd need to run to install a package:
$ pip install -- index-url <private-index-url> package_name
It's recommended to use Transport Layer Security (TLS) on any private package index for security reasons. If your private index does not have a proper TLS certificate, you'll need to specify the -- trusted-host
flag in your pip install command to install packages from it. This is because pip will consider the index untrustworthy without a valid TLS certificate.
Importing a Python package
You can import a package in Python using the import statement. The Python interpreter will then search for the package in the cache, built-in libraries, and installed paths. If the package is not found, an ImportError
will be raised.
You can also import specific modules from a package using the from
keyword, as shown here:
from datetime import date
Or, you can import more than one or all the modules from a package using the following commands:
from datetime import date, datetime
from datetime import *
It's also possible for you to give a package an alias by using the as
keyword. For instance, you could use the `pandas` package as pd
like this:
1import pandas as pd
What do Python packages constitute
Python packages need to adhere to a directory structure, which contains the following files and subdirectories:
A
src
directory with a subdirectory for the package name, including an__init__.py
initialization file. Thesrc
directory will also contain one or moremodule.py
files for the source code for the packageA
tests
directory containing scripts to test various package functions. Whether you want to add this directory to the package is up to you.A
pyproject.toml
file contains your package's configuration details and dependencies.A
LICENSE
file specifies your package's terms of use. There are many licenses you can choose from, or you can create your own.A
README.md
file contains clear documentation and instructions for use along with any other information that the package users might find helpful and insightful.
The files and directories mentioned previously are the basic requirements for creating your package. It's possible you'll see something more complex with additional files, submodules, or tests.
Packaging your code
To make a Python package publicly available, you need to start by creating one. This tutorial will walk you through building a simple Python program that converts kilometers (km) to miles (mi).
Naming your Python package
To name your Python package, you need to check PyPI to see if your chosen name is already in use. Usually, developers use the same name for both PyPI and your package; however, you can choose a different name if you want to.
When installing a package, you need to use the PyPI name. When importing a package, you need to use the package name.
Configuring your Python package
You must structure your package based on the required directory structure discussed previously. Place this code in the km2miles.py
file. You can create a __init__.py
file with this code.
The file km2miles.py
contains a function for converting kilometers to miles using the NumPy library. The __init__.py
file imports the main module and specifies the package's version.
Before you publish the package, you need to configure the pyproject.toml
file, where you'll also specify the build system, which you'll learn how to do next.
Configuring the build system
You need a build system to render the files you publish in the Python package. You can use a build frontend, such as pip, or a build backend, such as setuptools, Flit, Hatchling, or PDM.
Here, you'll work with setuptools because it's a build tool that automatically resolves dependencies for existing Python packages. This will be helpful when you're creating a Python package for a real use case.
The build system uses the pyproject.toml
file since it contains the backend tools' instructions to create project distribution packages. The build system specifications are defined in this file:
# pyproject.toml
[build-system]
requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"
You may find tutorials online that do not use the pyproject.toml
file and instead have a setup.py
file to configure their Python packages. This is because, previously, setuptools used the setup.py
file to run the installation process for complex packages. However, in modern projects, the pyproject.toml
file is generally the recommended way of configuring Python projects instead of the setup.py
or setup.cfg
files.
Configuring the Python package
Once you've configured the build system, you need to configure the package-specific information. These configuration details are placed below the build system details in the .toml
file:
1[project]
2name = "milesConverter"
3version = "1.0.0"
4description = "Convert KMs to Miles"
5readme = "README.md"
6authors = [{ name = "name", email = "name@gmail.com" }]
7license = { file = "LICENSE" }
8classifiers = [
9 "License :: OSI Approved :: MIT License",
10 "Programming Language :: Python",
11 "Programming Language :: Python :: 3",
12 "Operating System :: MacOS :: MacOS X",
13 "Operating System :: Microsoft :: Windows",
14 "Intended Audience :: Education"
15]
16keywords = ["converter", "miles", "kms"]
The following details are usually included in project configuration, and it's generally recommended to include them in all packages. However, not all of these details are mandatory:
Name: (mandatory) the name of the package that will be used on PyPI
Version: (mandatory) the current version of the package, which is also displayed on the main PyPI page for the package.
Description: (mandatory) additional information about the package that will be displayed on the PyPI page.
Authors: (mandatory) information about the developers of the package.
Readme: (optional) a description of the package in Markdown format, including information on how to use it, which will be displayed on the main PyPI page for the package.
License: (optional, but highly recommended) information about how others can use the package.
Classifiers: (optional) categories that describe the target audience, platform compatibility, and maturity of the package, which make it more searchable on PyPI. You can view a full list of acceptable classifiers here.
Keywords: (optional) a list of words that can help users discover the package on PyPI by searching for related terms. For example, a user looking for a package to convert miles to kilometers might search for keywords like "converter", "miles", and "kms".
Specifying dependencies
To specify dependencies in a project's `pyproject.toml`) file, add a dependencies field after the project description:
dependencies = ["numpy >= 1.20.3"]
This specifies that the project depends on the NumPy library. When the project is installed, these dependencies will be resolved by pip. When specifying dependencies, keep in mind the following:
Do not specify dependencies already taken care of by third-party packages.
Use
>=
to specify a lower bound for dependencies required for new functionality added in a specific version.Use
<
to specify an upper bound for dependencies that may cause compatibility issues during a major version update.
Optional dependencies can be added using the project.optional-dependencies
field:
[project.optional-dependencies]
dev = ["pip-tools", "pytest"]
These dependencies are not installed automatically but can be specified when installing the project using the syntax pip install package_name[optional_dependency_name]
.
Versioning your package
As specified earlier in the tutorial, you can version your Python package by using a version number and including it in the package's documentation. The version number should follow the commonly used semantic versioning format MAJOR.MINOR.PATCH
, where MAJOR
represents significant changes in the project, MINOR
represents new features, and PATCH
represents bug fixes or performance improvements.
Following is how you add a version in the configuration file:
version = "1.0.0"
This will specify the version of your project as 1.0.0
. You can then update the version number when you release new project versions.
Testing your package
You can go about testing your Python package in several different ways. For smaller projects, sometimes manual tests will suffice, but it's always good to have other types of automated tests in place to ensure that the code qualifies to a certain standard without manual intervention. Availability of automated tests increases the confidence in a package's reliability.
Unit tests: Help test specific pieces of code in isolation using a testing framework such as `robot`, `pytest`, `testify`, or `nose`.
Integration tests: Test how different parts of the package work together by setting up test environments and running the code. To implement integration tests, you can use a package like `tox`.
Manual tests: Run and check the code for expected results. You can use any Python client to run manual tests.
You can put all these tests in the tests
folder described earlier in the tutorial. Complex projects usually require analytics and visibility reports as well as coverage reports for testing. For this purpose, you can use packages like coverage.py
and `pytest-cov`.
Adding resource files to your package
To include non-source code files like data files, binaries, manuals, and configuration files in your Python package, you can create a resource file called MANIFEST.in
. You can use rules in this file to specify which files should be included or excluded when building the package.
For instance, the following code will include all .toml
files located in the src/mileConverter
directory:
# MANIFEST.in
include src/mileConverter/*.toml
You can also specify a separate .toml
file for this purpose.
Licensing your package
To include a license in your Python package and make it available for others to use, create a file called LICENSE
in the package directory and choose an existing license from a list of options, such as those provided on choosealicense.com.
In the pyproject.toml
file, you need to specify the LICENSE
filename in the license variable to make the license visible on the PyPI page, which is what you did when you started configuring the package's pyproject.toml
file.
Some of the most popular open-source licenses are MIT License, Apache License, and GNU General Public License (GPL), but there are many more you can choose from.
Installing your package locally
At this point, you've created all the necessary files and made all the required configurations for your package. Before publishing your package to PyPI, you should try installing it locally to check for any errors, which you can do with pip. You only need to specify the -e
or - editable
tag, which indicates that you're trying to install the package locally, as shown in the following command:
1$ pip install -e .
Now that you've installed your package locally, it's ready to be published to PyPI.
Publishing your package to PyPI
To publish your package, you need to start by building it.
Building your package
Use the build package to generate a wheel file for your Python package. To install the build frontend, run the following command:
$ pip install build
Then, navigate to the directory containing your pyproject.toml
file and run the following command:
$ python -m build
This will create a dist/
folder in the same location as the .toml file, containing two files:
dist/
├── milesConverter-1.0.0-py3-none-any.whl
└── milesConverter-1.0.0.tar.gz
The `.whl` file is a build distribution and the tar.gz
file is a source distribution. These files can eventually be uploaded to PyPI, but you need to test it first.
Testing your package
To test your package you need to install `twine`, a library that will also help you upload your package to PyPI. You can install it using the following command:
$ pip install twine
Once twine
is installed, you can check if your package description will render properly on PyPI using this command:
$ twine check dist/*
The output of the previous command will look something like this:
(base) gouravbais@Gouravs-Air Python_Package % twine check dist/*
Checking dist/milesConverter-1.0.0-py3-none-any.whl: PASSED
Checking dist/milesConverter-1.1.0-py3-none-any.whl: PASSED
Checking dist/milesConverter-1.0.0.tar.gz: PASSED
Checking dist/milesConverter-1.1.0.tar.gz: PASSED
If all your tests pass, your package is ready.
Uploading your package to the TestPyPI service
Before making your package publicly available, you should test and verify it using the TestPyPI service. You can test and verify your package by uploading it to TestPyPI, but first you'll need to create an account on the PyPI website. Then, you can use twine
to upload your package by running the following command, where you'll need your PyPI username and password:
$ twine upload -r testpypi dist/*
After the command successfully runs, you'll be provided a URL to access the results. You'll also be able to search for your package on the TestPyPI website. To download your package from the TestPyPI service, use the following command:
$ pip install -i https://test.pypi.org/simple/ milesConverter==1.0.0
Once downloaded, you'll be able to test your package to ensure that it's working correctly.
Uploading your package to the real PyPI service
Uploading to the real PyPI service is similar to uploading to the TestPyPI service. After you have validated your package in the previous step, you can upload your package to the real PyPI service. Again, use the twine tool to upload your package files to PyPI.
In your terminal, run the following command, replacing dist/*
with the path to your package files:
twine upload dist/*
After you've uploaded the package, you can visit the website and search for it.
Installing your package
Now you need to test your package. You can download your package with the following command:
$ pip install milesConverter
Once the package is installed, head over to any Python editor or Python runtime and test if your package is working:
from milesConverter import km2miles
a = km2miles.km2miles(16)
print(a)
Your output should look like this:
9.941936
Your package has now been uploaded to PyPI and is accessible through pip to any Python environment.
Publish your package to a private Python index
Previously, you learned why some organizations might want to distribute their packages using a Private Python Index. Moreover, publishing your Python package to a private index is easy. You just need to follow these steps:
Set up a directory structure for your package with
.whl
or.tar.gz
files for distribution purposes.Configure your web server to serve the root directory of your package with auto-indexing enabled.
Use
twine
to upload your package to the private index, using the URL of your private web server$ twine upload - repository-url your_web_server milesConverter/*
.To install the package from the private index, use the pip command with the URL of your web server (
$ pip install - extra-index-url your_web_server milesConverter
).
After you complete these steps, your package will be available on your private web server for you and your team to use.
Security best practices for creating Python packages
Now that you know how to publish your package, there are a few best practices you should consider before publishing it to ensure your package is secure and functioning correctly.
Pay attention to your dependencies
Before using a third-party dependency in your project, you should review its documentation to see if it has any security or vulnerability checks in place. Many libraries will have a whole section in their documentation for any known security issues and how to handle them. Having this documentation will install more confidence in you when you go about choosing a third-party package.
If there's no documentation, a tool like Snyk can help you identify vulnerabilities in any third-party libraries you're thinking of using.
Be careful with strings and raw SQL
It's recommended to be very careful when working with strings in Python, especially when using raw SQL in your code. A mishandling in this area can expose you to string injection or SQL injection attacks. An attacker can inject malicious code into a string or, more likely, a SQL query, resulting in unexpected and arbitrary SQL commands being executed in your database.
You can prevent such attacks by carefully considering how you format strings, especially when taking input from the user. It's always best to use standard libraries to format strings by utilizing the f-string
functionality.
Regarding raw SQL, you must use parameterized queries rather than directly concatenating user input into the raw SQL string. This can help prevent SQL injection attacks by ensuring the user input is escaped correctly, quoted, and verified.
Deserialize cautiously and securely
The pickle module in Python is not very secure and can be vulnerable to attacks such as the pickle bomb. To ensure that your package is secure, you may want to consider using an alternative library for deserialization, such as `dill`, which is essentially a secure wrapper on top of pickle
.
One of the vulnerabilities in pickle
is that it can allow the execution of arbitrary code during deserialization. In this situation, an attacker can inject a malicious pickle file that contains code that will be executed when the file is deserialized. dill
solves this by allowing you to specify which modules or functions are allowed to be unpicked, preventing any arbitrary execution of code during deserialization, making dill
more secure than pickle
.
Don't forget to remove secrets
Storing secrets securely is extremely important when working with sensitive data. If your secrets are not stored securely, they can be accessed by an attacker, who could then use those secrets to gain unauthorized access to your system and data. You can prevent that from happening by storing secrets securely using a secret management library, such as keyring
, passlib
, pycryptodome
, or pycrypto
, which will store your secrets in an encrypted format.
In addition, you should keep rotating your secrets from time to time. This reduces the risk of them being compromised and some of the tools mentioned above provide that functionality, along with other tools like HashiCorp Vault.
Use a tool to scan your code regularly for vulnerabilities
As you've seen here, it's important that you're aware of the various threats that can have an impact on your Python project, especially in the area of dependency management, string formatting, raw SQL queries, and deserialization. Solving all these problems on your own might be possible, but it's not recommended. It's better to use a tool that can scan and resolve vulnerability issues in your project.
A tool like Synk can make your life easier by regularly scanning your code for vulnerabilities. With Snyk, you can identify and fix any potential issues with your Python project. This not only helps you secure your project but also helps your application produce less number of bugs and functionality issues.
Conclusion
This tutorial showed you how to create a Python package. In addition, you learned why it's a useful method to organize and code your code, and how doing so makes your code reusable and sustainable.
This tutorial also expanded upon why it is extremely important that you're aware of the various threats and vulnerabilities that can mess up your project. You learned how to take care of some of these vulnerabilities and threats yourself by writing more code. However, not all organizations or teams have the capacity to do so, and even if they do, they shouldn't. It's better to use a standard, full-fledged product that takes care of vulnerabilities and threats constantly, making life easier.
A product like Synk can help you mitigate these threats by scanning your code thoroughly to identify any vulnerabilities and automatically fixing some of those issues.