A (soft) introduction to Python dependency management
Python has been deemed as a “simple” language — easy to use and easy to develop scripts to do numerous tasks — from web scraping to automation to building large-scale web applications and even performing data science. However, dependencies are managed quite differently in Python than in other languages, and the myriad options of setting up an environment and package managers only add to the confusion. In this post, we’ll take a look at different ways to approach Python dependency management, and briefly explore dependency security.
Python as an interpreted language
An interpreted language means that the language’s script is not translated to machine code or even an executable. The script is funneled through another program — in this case, the Python interpreter — that executes the code.
If I write a simple Python script like this:
print("Hello world") #main.py
And then run it with:
We get the output of:
Did you notice how there are no executable files, such as .exe, in the folder this code was executed in? That’s because the Python executable compiles and runs the file in execution time, and not ahead of time.
The trick is that there is an .exe file that is executing the Python code.
Main.py gets funneled into that .exe and then executed. That .exe file is the compiled Python interpreter that is available in your environment path.
What is a dependency in Python
A dependency in Python is like any other language — it is a code library required to run the script’s source code that it was imported for.
For example, If you want to interact with a PostgreSQL database, you have two options: Write all the functions yourself, or install the package
postgres, which already has the query and the other capabilities you also need.
So, how you would install the relevant PostgreSQL package?
pip install postgres
One important aspect of managing dependencies, in any language, is performing some due diligence to ensure that the package is well maintained, and has no security issues. One way of getting a hold of the overall package health is by using the Snyk Advisor.
With Snyk Advisor, you can search for the package health of libraries across different ecosystems, like PyPI, and npm. You can even review Docker base images and their overall statistics, popularity, available tags, and other useful information.
Back to our PostgreSQL related project, let’s make sure that the package health for
postgres is within a reasonable threshold:
Looks like the
postgres package doesn’t score very well, with a grade of 56 out of 100. It isn’t very popular, with just about 10,000 downloads a week, and its last release was two years ago, which may hint that its maintenance is lacking.
Scrolling down to the security and community sections on that page, one thing that stands out is the license risk for the latest package — version 3.0.0 from October 19, 2019 — that warrants a closer review:
If you chose to use this package, then following the prior
pip install step, now your system has access to the package
To import into your projects:
Now you should see where dependencies in Python can get messy. If I give this script to a friend that doesn’t have Python and they try to run it, they’ll encounter an error.
Why is there an error?
First of all, your friend needs to have Python installed. Secondly the necessary packages that your project depends on — in this case,
Additionally, the Python environment also needs all the necessary modules to run the script, which is similar to other languages when a specific library or source code is missing.
So there are two requirements thus if I want to share my python project with a friend:
- All the necessary files required to run Python
Python dependency management
When you want to share your Python projects with someone, you must include a file named
requirements.txt. You can use the
pip command to specifically install the dependencies that the project requires:
pip3 install -r requirements.txt
The reason we use
pip3 instead of just
pip is for ensuring you do not install the wrong version.
pip points to whatever version of Python you installed first. So if you installed a Python 2 version first, then the
pip command will point to
pip2, which is used for installing Python 2 packages.
Thus the term
pip3 explicitly tells pip to install the Python 3 package.
If you do not have any Python 2 version installed at all on your computer, then you do not need to worry about this.
You can check which
pip3) your pip command uses by running this in the console:
This command will read the text file
requirements.txt which declares all of the required dependencies to install for this project, and then continue to install the packages listed in it.
So to actually have packages installed when someone runs this command:
- You need to create this package manifest file
- The format of a packages to be listed in this file should be specified as follows:
<Package name> <Higher/equal to> <some version>
requirements.txt cloudscraper==1.2.58 # Version Matching. Must be version 1.2.58 Newspaper3k >= 0.1 # Minimum version 0.1 requests != 3.4 # Version Exclusion. Anything except version 3.4 beautifulsoup4 ~= 1.1 # Compatible release. Same as >= 1.1, == 1.* discord.py # Installs latest release
The comparators are used for version checking. It ensures that the pip package manager installs the right versions for your project. If no version specifiers are given, then the latest version is installed.
So now, if you share with your friends your project’s source code, you tell them to run the
pip3 command with the
requirements.txt file that is included with your project and they will have the necessary open source dependencies to run your project.
What about Python dependency managers?
Beyond the package manifest file that we learned about (
requirements.txt), there are some package managers you can use to ease the process of maintaining and managing project dependencies in Python. Here are a few you can install and use:
Poetry allows you to create lock files that contain the dependencies for your project. A lock file will automatically be installed when you install the package. The benefit of the lock file is that Poetry makes it easy to update the dependencies needed for the project.
Poetry works with command line tools. So commands such as:
poetry add <module> poetry build <module> poetry <publish>
These commands make it easy to track and add dependencies to this lock file, and subsequently add to your PyPi package.
You can also use the
--tree flag to show a tree diagram of the dependencies for your project:
my-package ├── pyproject.toml ├── README.rst ├── my_package │ └── __init__.py └── tests ├── __init__.py └── test_my_package.py
Poetry makes it easy to manage and add dependencies.
pyenv is a simple tool that allows you to switch between different versions of Python. This allows you to check if your dependencies would break an earlier version of Python. As this is its main use, it lacks the features of other dependency managers, such as Poetry.
However, if you need a tool to check if your package would work on an earlier version of Python, then pyenv is useful.
This is also a package to help with package management. It’s a little more intensive than something like Poetry, but it gets the job done.
As it says in its name, Setuptools helps with getting users set up with your package. Obviously talking about having the correct packages to run the script.
For setuptools, you create two files: A
pyproject.toml file and
In the .toml file, you put the following code to tell Python that you will be using Setuptools to specify your package info:
[build-system] requires = ["setuptools", "wheel"] build-backend = "setuptools.build_meta"
Now in your
setup.py/setup.cfg file, you’d have something like this:
[metadata] name = mypackage version = 0.0.1 [options] packages = mypackage install_requires = requests importlib; python_version == "2.6"
Now you’ll need a builder, such as PyPA build. Your project will be built into a .gz file, and it will be ready to distribute.
Setuptools requires more manual labour from you, but it’s efficient at package management and ensuring your users have all the necessary packages to run the script.
These three dependency managers are some of the more popular choices. They each have their strengths and weaknesses. Regardless of which one you choose, they all work well and efficiently to simplify a dependency managing nightmare in Python.
Better yet, the Snyk CLI shows you the dependency path that leads to the vulnerability, as well as how to fix the issue — by upgrading to a newer version that includes a security fix.
Secure your Python projects with Snyk
Scan your Python projects for security vulnerabilities for free.
There are other security concerns beyond just vulnerabilities in open source libraries. For example, insecure code that you might have in your Python applications, or the use of insecure containers that bundle your Python applications, and so on. Daniel Berman wrote an extensive article on getting started with Snyk for secure Python development which I highly recommend as a follow-up to this blog.
I hope you enjoyed learning about Python dependency management and security. If you’d like to reading more about Python security, check these out:
- Dependency management tools for Python
- What is package lock json? Lockfiles for yarn & npm packages
- How to maintain npm dependencies in your project
- Python Poetry package manager and security integration with software composition analysis tool
Lucian Irsigler is a passionate programmer who loves the complexities of life.