Skip to main content

The ultimate guide to creating a secure Python package

著者:
Gourav Singh Bais
Gourav Singh Bais
wordpress-sync/feature-assert-in-python

2024年5月8日

0 分で読めます

Creating a Python package involves several actions, such as figuring out an apt directory structure, creating package files, and configuring the package metadata before deploying it. There are a few other steps that you need to follow including creating a subdirectory for tests and clear documentation. Once the package is ready, you can distribute it to various distribution archives. With that, your Python package will be ready for others to install and use.

This guide will detail all the steps to build a modern Python package. You can find the code required to follow this tutorial in this GitHub repository. After implementing this tutorial, you'll realize that creating a Python package can be as simple as making use of an existing one, but first, let's start by learning how to use packages in Python.

How to create a Python package

Before you begin creating a Python package, you need to review the various methods of installation and importation, as well as the general structure of the packages. 

Installing packages in Python

To use Python packages, you can utilize public-facing packages from PyPI or private indexes that contain proprietary code within an organization. 

The following are two different ways to install a Python package:

Using PyPI

PyPI is the official repository that contains all public Python packages.

$ pip install package_name

Using a private index

In some scenarios, you may need to use publicly available packages from PyPI and privately available packages from a private index. Private indexes are typically used within large organizations for proprietary projects whose source code will not be shared publicly. These indexes allow you to store and distribute Python packages internally within your organization. 

To install packages from a private index, you can use the pip install command with the -- index-url flag, followed by the URL of the private index and the package name. Following is an example of a command you'd need to run to install a package:

$ pip install -- index-url <private-index-url> package_name

It's recommended to use Transport Layer Security (TLS) on any private package index for security reasons. If your private index does not have a proper TLS certificate, you'll need to specify the  -- trusted-host flag in your pip install command to install packages from it. This is because pip will consider the index untrustworthy without a valid TLS certificate.

Importing a Python package

You can import a package in Python using the import statement. The Python interpreter will then search for the package in the cache, built-in libraries, and installed paths. If the package is not found, an ImportError will be raised.

You can also import specific modules from a package using the from keyword, as shown here:

from datetime import date

Or, you can import more than one or all the modules from a package using the following commands:

from datetime import date, datetime
from datetime import *

It's also possible for you to give a package an alias by using the as keyword. For instance, you could use the `pandas` package as pd like this:

1import pandas as pd

What do Python packages constitute

Python packages need to adhere to a directory structure, which contains the following files and subdirectories:

  • A src directory with a subdirectory for the package name, including an __init__.py initialization file. The src directory will also contain one or more module.py files for the source code for the package

  • A tests directory containing scripts to test various package functions. Whether you want to add this directory to the package is up to you.

  • A pyproject.toml file contains your package's configuration details and dependencies.

  • A LICENSE file specifies your package's terms of use. There are many licenses you can choose from, or you can create your own.

  • A README.md file contains clear documentation and instructions for use along with any other information that the package users might find helpful and insightful.

The files and directories mentioned previously are the basic requirements for creating your package. It's possible you'll see something more complex with additional files, submodules, or tests.

Packaging your code

To make a Python package publicly available, you need to start by creating one. This tutorial will walk you through building a simple Python program that converts kilometers (km) to miles (mi).

Naming your Python package

To name your Python package, you need to check PyPI to see if your chosen name is already in use. Usually, developers use the same name for both PyPI and your package; however, you can choose a different name if you want to. 

When installing a package, you need to use the PyPI name. When importing a package, you need to use the package name.

Configuring your Python package

You must structure your package based on the required directory structure discussed previously. Place this code in the km2miles.py file. You can create a __init__.py file with this code.

The file km2miles.py contains a function for converting kilometers to miles using the NumPy library. The __init__.py file imports the main module and specifies the package's version. 

Before you publish the package, you need to configure the pyproject.toml file, where you'll also specify the build system, which you'll learn how to do next. 

Configuring the build system

You need a build system to render the files you publish in the Python package. You can use a build frontend, such as pip, or a build backend, such as setuptools, Flit, Hatchling, or PDM.

Here, you'll work with setuptools because it's a build tool that automatically resolves dependencies for existing Python packages. This will be helpful when you're creating a Python package for a real use case. 

The build system uses the pyproject.toml file since it contains the backend tools' instructions to create project distribution packages. The build system specifications are defined in this file:

# pyproject.toml
[build-system]
requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"

You may find tutorials online that do not use the pyproject.toml file and instead have a setup.py file to configure their Python packages. This is because, previously, setuptools used the setup.py file to run the installation process for complex packages. However, in modern projects, the pyproject.toml file is generally the recommended way of configuring Python projects instead of the setup.py or setup.cfg files.

Configuring the Python package

Once you've configured the build system, you need to configure the package-specific information. These configuration details are placed below the build system details in the .toml file:

1[project]
2name = "milesConverter"
3version = "1.0.0"
4description = "Convert KMs to Miles"
5readme = "README.md"
6authors = [{ name = "name", email = "name@gmail.com" }]
7license = { file = "LICENSE" }
8classifiers = [
9 "License :: OSI Approved :: MIT License",
10 "Programming Language :: Python",
11 "Programming Language :: Python :: 3",
12 "Operating System :: MacOS :: MacOS X",
13 "Operating System :: Microsoft :: Windows",
14 "Intended Audience :: Education"
15]
16keywords = ["converter", "miles", "kms"]

The following details are usually included in project configuration, and it's generally recommended to include them in all packages. However, not all of these details are mandatory:

  • Name: (mandatory) the name of the package that will be used on PyPI

  • Version: (mandatory) the current version of the package, which is also displayed on the main PyPI page for the package.

  • Description: (mandatory) additional information about the package that will be displayed on the PyPI page.

  • Authors: (mandatory) information about the developers of the package.

  • Readme: (optional) a description of the package in Markdown format, including information on how to use it, which will be displayed on the main PyPI page for the package.

  • License: (optional, but highly recommended) information about how others can use the package.

  • Classifiers: (optional) categories that describe the target audience, platform compatibility, and maturity of the package, which make it more searchable on PyPI. You can view a full list of acceptable classifiers here.

  • Keywords: (optional) a list of words that can help users discover the package on PyPI by searching for related terms. For example, a user looking for a package to convert miles to kilometers might search for keywords like "converter", "miles", and "kms".

Specifying dependencies

To specify dependencies in a project's `pyproject.toml`) file, add a dependencies field after the project description:

dependencies = ["numpy >= 1.20.3"]

This specifies that the project depends on the NumPy library. When the project is installed, these dependencies will be resolved by pip. When specifying dependencies, keep in mind the following:

  • Do not specify dependencies already taken care of by third-party packages.

  • Use >= to specify a lower bound for dependencies required for new functionality added in a specific version.

  • Use < to specify an upper bound for dependencies that may cause compatibility issues during a major version update.

Optional dependencies can be added using the project.optional-dependencies field:

[project.optional-dependencies]
dev = ["pip-tools", "pytest"]

These dependencies are not installed automatically but can be specified when installing the project using the syntax pip install package_name[optional_dependency_name].

Versioning your package

As specified earlier in the tutorial, you can version your Python package by using a version number and including it in the package's documentation. The version number should follow the commonly used semantic versioning format MAJOR.MINOR.PATCH, where MAJOR represents significant changes in the project, MINOR represents new features, and PATCH represents bug fixes or performance improvements. 

Following is how you add a version in the configuration file:

version = "1.0.0"

This will specify the version of your project as 1.0.0. You can then update the version number when you release new project versions.

Testing your package

You can go about testing your Python package in several different ways. For smaller projects, sometimes manual tests will suffice, but it's always good to have other types of automated tests in place to ensure that the code qualifies to a certain standard without manual intervention. Availability of automated tests increases the confidence in a package's reliability.

  • Unit tests:  Help test specific pieces of code in isolation using a testing framework such as `robot`, `pytest`, `testify`, or `nose`.

  • Integration tests:   Test how different parts of the package work together by setting up test environments and running the code. To implement integration tests, you can use a package like `tox`.

  • Manual tests: Run and check the code for expected results. You can use any Python client to run manual tests.

You can put all these tests in the tests folder described earlier in the tutorial. Complex projects usually require analytics and visibility reports as well as coverage reports for testing. For this purpose, you can use packages like coverage.py and `pytest-cov`.

Adding resource files to your package

To include non-source code files like data files, binaries, manuals, and configuration files in your Python package, you can create a resource file called MANIFEST.in. You can use rules in this file to specify which files should be included or excluded when building the package. 

For instance, the following code will include all .toml files located in the src/mileConverter directory:

# MANIFEST.in
include src/mileConverter/*.toml

You can also specify a separate .toml file for this purpose.

Licensing your package

To include a license in your Python package and make it available for others to use, create a file called LICENSE in the package directory and choose an existing license from a list of options, such as those provided on choosealicense.com

In the pyproject.toml file, you need to specify the LICENSE filename in the license variable to make the license visible on the PyPI page, which is what you did when you started configuring the package's pyproject.toml file.

Some of the most popular open-source licenses are MIT License, Apache License, and GNU General Public License (GPL), but there are many more you can choose from.

Installing your package locally

At this point, you've created all the necessary files and made all the required configurations for your package. Before publishing your package to PyPI, you should try installing it locally to check for any errors, which you can do with pip. You only need to specify the -e or - editable tag, which indicates that you're trying to install the package locally, as shown in the following command:

1$ pip install -e .

Now that you've installed your package locally, it's ready to be published to PyPI.

Publishing your package to PyPI

To publish your package, you need to start by building it.

Building your package

Use the build package to generate a wheel file for your Python package. To install the build frontend, run the following command:

$ pip install build

Then, navigate to the directory containing your pyproject.toml file and run the following command:

$ python -m build

This will create a dist/ folder in the same location as the .toml file, containing two files:

dist/
├── milesConverter-1.0.0-py3-none-any.whl
└── milesConverter-1.0.0.tar.gz

The `.whl` file is a build distribution and the tar.gz file is a source distribution. These files can eventually be uploaded to PyPI, but you need to test it first.

Testing your package

To test your package you need to install `twine`, a library that will also help you upload your package to PyPI. You can install it using the following command:

$ pip install twine

Once twine is installed, you can check if your package description will render properly on PyPI using this command:

$ twine check dist/*

The output of the previous command will look something like this:

(base) gouravbais@Gouravs-Air Python_Package % twine check dist/*
Checking dist/milesConverter-1.0.0-py3-none-any.whl: PASSED
Checking dist/milesConverter-1.1.0-py3-none-any.whl: PASSED
Checking dist/milesConverter-1.0.0.tar.gz: PASSED
Checking dist/milesConverter-1.1.0.tar.gz: PASSED

If all your tests pass, your package is ready.

Uploading your package to the TestPyPI service

Before making your package publicly available, you should test and verify it using the TestPyPI service. You can test and verify your package by uploading it to TestPyPI, but first you'll need to create an account on the PyPI website. Then, you can use twine to upload your package by running the following command, where you'll need your PyPI username and password:

$ twine upload -r testpypi dist/*

After the command successfully runs, you'll be provided a URL to access the results. You'll also be able to search for your package on the TestPyPI website. To download your package from the TestPyPI service, use the following command:

$ pip install -i https://test.pypi.org/simple/ milesConverter==1.0.0

Once downloaded, you'll be able to test your package to ensure that it's working correctly.

Uploading your package to the real PyPI service

Uploading to the real PyPI service is similar to uploading to the TestPyPI service. After you have validated your package in the previous step, you can upload your package to the real PyPI service. Again, use the twine tool to upload your package files to PyPI. 

In your terminal, run the following command, replacing dist/* with the path to your package files:

twine upload dist/*

After you've uploaded the package, you can visit the website and search for it.

Installing your package

Now you need to test your package. You can download your package with the following command:

$ pip install milesConverter

Once the package is installed, head over to any Python editor or Python runtime and test if your package is working:

from milesConverter import km2miles
a = km2miles.km2miles(16)
print(a)

Your output should look like this:

9.941936

Your package has now been uploaded to PyPI and is accessible through pip to any Python environment.

Publish your package to a private Python index

Previously, you learned why some organizations might want to distribute their packages using a Private Python Index. Moreover, publishing your Python package to a private index is easy. You just need to follow these steps:

  • Set up a directory structure for your package with .whl or .tar.gz files for distribution purposes.

  • Configure your web server to serve the root directory of your package with auto-indexing enabled.

  • Use twine to upload your package to the private index, using the URL of your private web server $ twine upload - repository-url your_web_server milesConverter/*.

  • To install the package from the private index, use the pip command with the URL of your web server ($ pip install - extra-index-url your_web_server milesConverter).

After you complete these steps, your package will be available on your private web server for you and your team to use.

Security best practices for creating Python packages

Now that you know how to publish your package, there are a few best practices you should consider before publishing it to ensure your package is secure and functioning correctly.

Pay attention to your dependencies

Before using a third-party dependency in your project, you should review its documentation to see if it has any security or vulnerability checks in place. Many libraries will have a whole section in their documentation for any known security issues and how to handle them. Having this documentation will install more confidence in you when you go about choosing a third-party package.

If there's no documentation, a tool like Snyk can help you identify vulnerabilities in any third-party libraries you're thinking of using.

Be careful with strings and raw SQL

It's recommended to be very careful when working with strings in Python, especially when using raw SQL in your code. A mishandling in this area can expose you to string injection or SQL injection attacks. An attacker can inject malicious code into a string or, more likely, a SQL query, resulting in unexpected and arbitrary SQL commands being executed in your database.

You can prevent such attacks by carefully considering how you format strings, especially when taking input from the user. It's always best to use standard libraries to format strings by utilizing the f-string functionality. 

Regarding raw SQL, you must use parameterized queries rather than directly concatenating user input into the raw SQL string. This can help prevent SQL injection attacks by ensuring the user input is escaped correctly, quoted, and verified.

Deserialize cautiously and securely

The pickle module in Python is not very secure and can be vulnerable to attacks such as the pickle bomb. To ensure that your package is secure, you may want to consider using an alternative library for deserialization, such as `dill`, which is essentially a secure wrapper on top of pickle.

One of the vulnerabilities in pickle is that it can allow the execution of arbitrary code during deserialization. In this situation, an attacker can inject a malicious pickle file that contains code that will be executed when the file is deserialized. dill solves this by allowing you to specify which modules or functions are allowed to be unpicked, preventing any arbitrary execution of code during deserialization, making dill more secure than pickle.

Don't forget to remove secrets

Storing secrets securely is extremely important when working with sensitive data. If your secrets are not stored securely, they can be accessed by an attacker, who could then use those secrets to gain unauthorized access to your system and data. You can prevent that from happening by storing secrets securely using a secret management library, such as keyring, passlib, pycryptodome, or pycrypto, which will store your secrets in an encrypted format.

In addition, you should keep rotating your secrets from time to time. This reduces the risk of them being compromised and some of the tools mentioned above provide that functionality, along with other tools like HashiCorp Vault.

Use a tool to scan your code regularly for vulnerabilities

As you've seen here, it's important that you're aware of the various threats that can have an impact on your Python project, especially in the area of dependency management, string formatting, raw SQL queries, and deserialization. Solving all these problems on your own might be possible, but it's not recommended. It's better to use a tool that can scan and resolve vulnerability issues in your project.

A tool like Synk can make your life easier by regularly scanning your code for vulnerabilities. With Snyk, you can identify and fix any potential issues with your Python project. This not only helps you secure your project but also helps your application produce less number of bugs and functionality issues.

Conclusion

This tutorial showed you how to create a Python package. In addition, you learned why it's a useful method to organize and code your code, and how doing so makes your code reusable and sustainable. 

This tutorial also expanded upon why it is extremely important that you're aware of the various threats and vulnerabilities that can mess up your project. You learned how to take care of some of these vulnerabilities and threats yourself by writing more code. However, not all organizations or teams have the capacity to do so, and even if they do, they shouldn't. It's better to use a standard, full-fledged product that takes care of vulnerabilities and threats constantly, making life easier.

A product like Synk can help you mitigate these threats by scanning your code thoroughly to identify any vulnerabilities and automatically fixing some of those issues.