A (soft) introduction to Python dependency management
2021年9月14日
0 分で読めますPython has been deemed as a “simple” language — easy to use and easy to develop scripts to do numerous tasks — from web scraping to automation to building large-scale web applications and even performing data science. However, dependencies are managed quite differently in Python than in other languages, and the myriad options of setting up an environment and package managers only add to the confusion. In this post, we'll take a look at different ways to approach Python dependency management, and briefly explore dependency security.
Python as an interpreted language
An interpreted language means that the language’s script is not translated to machine code or even an executable. The script is funneled through another program — in this case, the Python interpreter — that executes the code.
If I write a simple Python script like this:
1print("Hello world") #main.py
And then run it with:
1python pathToScript/main.py
We get the output of:
1Hello world
Did you notice how there are no executable files, such as .exe, in the folder this code was executed in? That’s because the Python executable compiles and runs the file in execution time, and not ahead of time.
The trick is that there is an .exe file that is executing the Python code. Main.py
gets funneled into that .exe and then executed. That .exe file is the compiled Python interpreter that is available in your environment path.
What is a dependency in Python
A dependency in Python is like any other language — it is a code library required to run the script’s source code that it was imported for.
For example, If you want to interact with a PostgreSQL database, you have two options: Write all the functions yourself, or install the package postgres
, which already has the query and the other capabilities you also need.
So, how you would install the relevant PostgreSQL package?
1pip install postgres
One important aspect of managing dependencies, in any language, is performing some due diligence to ensure that the package is well maintained, and has no security issues. One way of getting a hold of the overall package health is by using the Snyk Advisor.
With Snyk Advisor, you can search for the package health of libraries across different ecosystems, like PyPI, and npm. You can even review Docker base images and their overall statistics, popularity, available tags, and other useful information.
Back to our PostgreSQL related project, let’s make sure that the package health for postgres
is within a reasonable threshold:
Looks like the postgres
package doesn’t score very well, with a grade of 56 out of 100. It isn’t very popular, with just about 10,000 downloads a week, and its last release was two years ago, which may hint that its maintenance is lacking.
Scrolling down to the security and community sections on that page, one thing that stands out is the license risk for the latest package — version 3.0.0 from October 19, 2019 — that warrants a closer review:
If you chose to use this package, then following the prior pip install
step, now your system has access to the package postgres
.
To import into your projects:
1import postgres
Now you should see where dependencies in Python can get messy. If I give this script to a friend that doesn’t have Python and they try to run it, they’ll encounter an error.
Why is there an error?
First of all, your friend needs to have Python installed. Secondly the necessary packages that your project depends on — in this case, postgres
.
Additionally, the Python environment also needs all the necessary modules to run the script, which is similar to other languages when a specific library or source code is missing.
So there are two requirements thus if I want to share my python project with a friend:
All the necessary files required to run Python
The
postgres
package
Python dependency management
When you want to share your Python projects with someone, you must include a file named requirements.txt
. You can use the pip
command to specifically install the dependencies that the project requires:
1pip3 install -r requirements.txt
The reason we use pip3
instead of just pip
is for ensuring you do not install the wrong version. pip
points to whatever version of Python you installed first. So if you installed a Python 2 version first, then the pip
command will point to pip2
, which is used for installing Python 2 packages.
Thus the term pip3
explicitly tells pip to install the Python 3 package.
If you do not have any Python 2 version installed at all on your computer, then you do not need to worry about this.
You can check which pip
(pip2
or pip3
) your pip command uses by running this in the console:
1pip --version
This command will read the text file requirements.txt
which declares all of the required dependencies to install for this project, and then continue to install the packages listed in it.
So to actually have packages installed when someone runs this command:
You need to create this package manifest file
The format of a packages to be listed in this file should be specified as follows:
<Package name> <Higher/equal to> <some version>
For example:
1requirements.txt
2
3cloudscraper==1.2.58 # Version Matching. Must be version 1.2.58
4Newspaper3k >= 0.1 # Minimum version 0.1
5requests != 3.4 # Version Exclusion. Anything except version 3.4
6beautifulsoup4 ~= 1.1 # Compatible release. Same as >= 1.1, == 1.*
7discord.py # Installs latest release
8
The comparators are used for version checking. It ensures that the pip package manager installs the right versions for your project. If no version specifiers are given, then the latest version is installed.
So now, if you share with your friends your project’s source code, you tell them to run the pip3
command with the requirements.txt
file that is included with your project and they will have the necessary open source dependencies to run your project.
What about Python dependency managers?
Beyond the package manifest file that we learned about (requirements.txt
), there are some package managers you can use to ease the process of maintaining and managing project dependencies in Python. Here are a few you can install and use:
Poetry
Poetry allows you to create lock files that contain the dependencies for your project. A lock file will automatically be installed when you install the package. The benefit of the lock file is that Poetry makes it easy to update the dependencies needed for the project.
Poetry works with command line tools. So commands such as:
1poetry add <module>
2
3poetry build <module>
4
5poetry <publish>
These commands make it easy to track and add dependencies to this lock file, and subsequently add to your PyPi package.
You can also use the --tree
flag to show a tree diagram of the dependencies for your project:
1my-package
2├── pyproject.toml
3├── README.rst
4├── my_package
5│ └── __init__.py
6└── tests
7 ├── __init__.py
8 └── test_my_package.py
Poetry makes it easy to manage and add dependencies.
pyenv
pyenv is a simple tool that allows you to switch between different versions of Python. This allows you to check if your dependencies would break an earlier version of Python. As this is its main use, it lacks the features of other dependency managers, such as Poetry.
However, if you need a tool to check if your package would work on an earlier version of Python, then pyenv is useful.
Setuptools
This is also a package to help with package management. It's a little more intensive than something like Poetry, but it gets the job done.
As it says in its name, Setuptools helps with getting users set up with your package. Obviously talking about having the correct packages to run the script.
For setuptools, you create two files: A pyproject.toml
file and setup.py/setup.cfg
file.
In the .toml file, you put the following code to tell Python that you will be using Setuptools to specify your package info:
1[build-system]
2requires = ["setuptools", "wheel"]
3build-backend = "setuptools.build_meta"
Now in your setup.py/setup.cfg
file, you’d have something like this:
1[metadata]
2name = mypackage
3version = 0.0.1
4
5[options]
6packages = mypackage
7install_requires =
8 requests
9 importlib; python_version == "2.6"
Now you’ll need a builder, such as PyPA build. Your project will be built into a .gz file, and it will be ready to distribute.
Setuptools requires more manual labour from you, but it’s efficient at package management and ensuring your users have all the necessary packages to run the script.
These three dependency managers are some of the more popular choices. They each have their strengths and weaknesses. Regardless of which one you choose, they all work well and efficiently to simplify a dependency managing nightmare in Python.
Better yet, the Snyk CLI shows you the dependency path that leads to the vulnerability, as well as how to fix the issue — by upgrading to a newer version that includes a security fix.
Wrapping up
There are other security concerns beyond just vulnerabilities in open source libraries. For example, insecure code that you might have in your Python applications, or the use of insecure containers that bundle your Python applications, and so on. Daniel Berman wrote an extensive article on getting started with Snyk for secure Python development which I highly recommend as a follow-up to this blog.
I hope you enjoyed learning about Python dependency management and security. If you'd like to reading more about Python security, check these out:
What is package lock json? Lockfiles for yarn & npm packages
Python Poetry package manager and security integration with software composition analysis tool
Lucian Irsigler is a passionate programmer who loves the complexities of life.