When the Python programming language was first released in the early 1990s, a Python application was run by pointing the Python scripts to the interpreter. Everything related to packaging, releasing, and distributing Python projects was done manually. There was no real standard back then, and each project had a long README on how to install it with all its dependencies.
Bigger projects used system packaging tools to release their work—whether it was Debian packages, RPM packages for Red Hat Linux distributions, or MSI packages under Windows. Eventually, the Python modules from those projects all ended up in the site-packages
directory of the Python installation, sometimes after a compilation phase, if they had a C extension.
The Python packaging ecosystem has evolved a lot since then. In 1998, Distutils
was added to the standard library to provide essential support to create installable distributions for Python projects. Since then, a lot of new tools have emerged from the community to improve how a Python project can be packaged, released, and distributed. This chapter is going to explain how to use the latest Python packaging tools for your microservices.
The other hot topic around packaging is how it fits in with your day-to-day work. When building microservices-based software, you need to deal with many moving parts. When you are working in a particular microservice, you can get away with using the TDD and mocking approach most of the time, which we discussed in Chapter 3, Coding, Testing, and Documentation: the Virtuous Cycle.
However, if you want to do some realistic testing, and examine all the parts of the system, you need the whole stack running either locally or on a test cloud instance. Moreover, developing in such a context can be tedious if you need to reinstall new versions of your microservices all the time. This leads to one question in particular: how can you correctly install the whole stack in your environment and develop in it?
It also means you have to run all the microservices if you want to play with the app. In the case of Jeeves, having to open multiple different shells to run all the microservices is not something a developer would want to do every time they need to run the app.
In this chapter, we are going to look at how we can leverage the packaging tools to run all microservices from the same environment, and then how to run them all from a single Command-Line Interface (CLI) by using a dedicated process manager. First, however, we will look at how to package your projects, and which tools should be utilized.
Python has come a long way since the days of those early packaging methods. Numerous Python Enhancement Proposals (PEPs) were written to improve how to install, release, and distribute Python projects.
Distutils
had some flaws that made it a little tedious to release software. The biggest pain points were its lack of dependency management and the way it handled compilation and binary releases. For everything related to compiling, what worked well in the nineties started to get old-fashioned ten years later. No one in the core team made the library evolve due to lack of interest, and also because Distutils
was good enough to compile Python and most projects. People who needed advanced toolchains used other tools, like SCons
(http://scons.org/).
In any case, improving the toolchain was not an easy task because of the existing legacy system based on Distutils
. Starting a new packaging system from scratch was quite hard, since Distutils
was part of the standard library, but introducing backward-compatible changes was also hard to do properly. The improvements were made in between. Projects like Setuptools
and virtualenv
were created outside the standard library, and some changes were made directly in Python.
As of the time of writing, you still find the scars from these changes, and it is still quite hard to know exactly how things should be done. For instance, the pyvenv
command was added in early versions of Python 3 and then removed in Python 3.6, but Python still ships with its virtual environment module, although there are also tools such as virtualenv
to help make life easier.
The best bet is to use the tools that are developed and maintained outside the standard library, because their release cycle is shorter than Python's. In other words, a change in the standard library takes months to be released, whereas a change in a third-party project can be made available much faster. All third-party projects that are considered as being part of the de facto standard packaging toolchain are now all grouped under the PyPA (https://www.pypa.io) umbrella project.
Besides developing the tools, PyPA
also works on improving the packaging standards through proposing PEPs for Python and developing its early specifications—refer to https://www.pypa.io/en/latest/roadmap/. There are often new tools and experiments in packaging and dependency management that let us learn new things whether or not they become popular. For this chapter, we will stick with the core, well-known tools.
Before we start to look at the tools that should be used, we need to go through a few definitions to avoid any confusion.
When we talk about packaging Python projects, a few terms can be confusing, because their definitions have evolved over time, and also because they can mean slightly different things outside the Python world. We need to define a Python package, a Python project, a Python library, and a Python application. They are defined as follows:
The distinction between an application and a library can be quite vague, since some libraries sometimes offer some command-line tools to use some of their features, even if the first use case is to provide Python packages for other projects. Moreover, sometimes, a project that was a library becomes an application.
To simplify the process, the best option is to make no distinction between applications and libraries. The only technical difference is that applications ship with more data files and console scripts.
Now that we have defined the terminology around a Python package, project, application, and library, let's look at how projects are packaged.
When you package your Python project, there are three standard files you need to have alongside your Python packages:
pyproject.toml
: A configuration file for the project's build systemsetup.py
or setup.cfg
: A special module that controls packaging and metadata about the projectrequirements.txt
: A file listing dependenciesLet's look at each one of them in detail.
The setup.py
file is what governs everything when you want to interact with a Python project. When the setup()
function is executed, it generates a static metadata file that follows the PEP 314
format. The metadata file holds all the metadata for the project, but you need to regenerate it via a setup()
call to get it to the Python environment you are using.
The reason why you cannot use a static version is that the author of a project might have platform-specific code in setup.py
, which generates a different metadata file depending on the platform and Python versions. Relying on running a Python module to extract static information about a project has always been a problem. You need to make sure that the code in the module can run in the target Python interpreter. If you are going to make your microservices available to the community, you need to keep that in mind, as the installation happens in many different Python environments.
A common mistake when creating the setup.py
file is to import your package into it when you have third-party dependencies. If a tool like pip
tries to read the metadata by running setup.py
, it might raise an import error before it has a chance to list all the dependencies to install. The only dependency you can afford to import directly in your setup.py
file is Setuptools
, because you can assume that anyone trying to install your project is likely to have it in their environment.
Another important consideration is the metadata you want to include to describe your project. Your project can work with just a name, a version, a URL, and an author, but this is obviously not enough information to describe your project. Metadata fields are set through setup()
arguments. Some of them match directly with the name of the metadata, while some do not.
The following is a useful set of arguments you could use for your microservices projects:
name
: The name of the package; should be short and lowercaseversion
: The version of the project, as defined in PEP 440
url
: A URL for the project; can be its repository or home pagedescription
: One sentence to describe the projectlong_description
: A reStructuredText or Markdown documentauthor
and author_email
: The name and email of the author—can be an organizationlicense
: The license used for the project (MIT, Apache2, GPL, and so on)classifiers
: A list of classifiers picked from a fixed list, as defined in PEP 301
keywords
: Tags to describe your project—this is useful if you publish the project to the Python Package Index (PyPI)packages
: A list of packages that your project includes—Setuptools
can populate that option automatically with the find_packages()
methodentry_points
: A list of Setuptools
hooks, like console scripts (this is a Setuptools
option)include_package_data
: A flag that simplifies the inclusion of non-Python fileszip_safe
: A flag that prevents Setuptools
from installing the project as a ZIP file, which is a historical standard (executable eggs)If you are missing any critical options, then Setuptools
will provide information about the ones it needs when you try to use it. The following is an example of a setup.py
file that includes these options:
from setuptools import setup, find_packages
with open("README.rst") as f:
LONG_DESC = f.read()
setup(
name="MyProject",
version="1.0.0",
url="http://example.com",
description="This is a cool microservice based on Quart.",
long_description=LONG_DESC,
long_description_content_type="text/x-rst",
author="Simon",
author_email="[email protected]",
license="MIT",
classifiers=[
"Development Status :: 3 - Alpha",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
],
keywords=["quart", "microservice"],
packages=find_packages(),
include_package_data=True,
zip_safe=False,
install_requires=["quart"],
)
The long_description
option is usually pulled from a README.rst
file, so you do not have to deal with including a large piece of reStructuredText string in your function.
The Twine
project (https://pypi.org/project/twine/)—which we will use later to upload packages to PyPI—has a check command to ensure the long description can be rendered properly. Adding this check to Continuous Integration (CI) as part of a standard test suite is a good idea, to ensure the documentation on PyPI is readable. The other benefit of separating the description is that it's automatically recognized, parsed, and displayed by most editors. For instance, GitHub uses it as your project landing page in your repository, while also offering an inline reStructuredText editor to change it directly from the browser. PyPI does the same to display the front page of the project.
The license
field is freeform, as long as people can recognize the license being used. https://choosealicense.com/ offers impartial advice about which open-source software license is most appropriate for you, if you plan to release the source code—and you should strongly consider it, as our progress through this book and the myriad of tools used have all been based on open-source projects, and adding more to the community helps everyone involved. In any case, you should add, alongside your setup.py
file, a LICENCE
file with the official text of that license. In open-source projects it is common practice now to also include a "Code Of Conduct," such as the Contributor Covenant
: https://www.contributor-covenant.org/.
This is because working with people from around the world involves many different cultures and expectations, and being open about the nature of the community is another aspect that helps everyone.
The classifiers option is probably the most painful one to write. You need to use strings from https://pypi.python.org/pypi?%3Aaction=list_classifiers that classify your project. The three most common classifiers that developers use are the list of supported Python versions, the license (which duplicates and should match the license option), and the development status, which is a hint about the maturity of the project.
The Trove
classifier is machine-parsable metadata that can be used by tools interacting with PyPI
. For example, the zc.buildout
tool looks for packages with the Framework :: Buildout :: Recipe
classifier. A list of valid classifiers is available at https://pypi.org/classifiers/.
Keywords are a good way to make your project visible if you publish it to the Python Package Index. For instance, if you are creating a Quart
microservice, you should use "quart" and "microservice" as keywords.
The entry_points
section is an INI-like string that defines ways to interact with your Python module through callables—most commonly a console script. When you add functions in that section, a command-line script will be installed alongside the Python interpreter, and the function hooked to it via the entry point. This is a good way to create a CLI for your project. In the example, mycli
should be directly reachable in the shell when the project is installed. Lastly, install_requires
lists all the dependencies. It's a list of Python projects the project uses and can be used by projects like pip
when the installation occurs. The tool will grab them if they are published in PyPI and install them. It is also possible to read the dependencies from the file we will be discussing next, requirements.txt
, and to read the version from a separate text file—or JSON file—so that the version can be easily used in multiple places if it's needed in the release pipeline. Since the JSON module is part of the standard library, there is no extra dependency added by importing it.
Once this setup.py
file is created, a good way to try it is by creating a local virtual environment.
Assuming you have virtualenv
installed, and you run these commands in the directory containing the setup.py
file, it will create a few directories, including a bin
directory containing a local Python interpreter, and drop you into a local shell:
$ python3 –m venv ./my-project-venv
$ source ./my-project-venv/bin/activate
(my-project-venv) $
There are several helper tools to make managing your virtual environments easier, such as virtualenvwrapper
(https://virtualenvwrapper.readthedocs.io/en/latest/), but we will keep to the core functionality with our examples.
From here, running the pip install -e command
will install the project in editable mode. This command installs the project by reading its setup file, but unlike install
, the installation occurs in-place. Installing in-place means that you will be able to work directly on the Python modules in the project, and they will be linked to the local Python installation via its site-packages
directory.
Using a regular install
call would have created copies of the files in the local site-packages
directory, and changing the source code would have had no impact on the installed version.
The pip
call also generates a MyProject.egg-info
directory, which contains the metadata. pip generates version 1.1 of the metadata spec under the PKG-INFO
name:
$ more MyProject.egg-info/PKG-INFO
Metadata-Version: 2.1
Name: MyProject
Version: 1.0.0
Summary: This is a cool microservice based on Quart.
Home-page: http://example.com
Author: Simon
Author-email: [email protected]
License: MIT
Description: long description!
Keywords: quart,microservice
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/x-rst
This metadata file is what describes your project, and is used to register it to PyPI via other commands, as we will see later in the chapter.
The pip call also pulls all the project dependencies by looking for them in PyPI on https://pypi.python.org/pypi and installs them in the local site-packages
. Running this command is a good way to make sure everything works as expected.
One standard that emerged from the pip community is to use a requirements.txt
file, which lists all the project dependencies, but also proposes an extended syntax to install editable dependencies. Refer to https://pip.pypa.io/en/stable/cli/pip_install/#requirements-file-format.
The following is an example of such a file:
arrow
python-dateutil
pytz
requests
six
stravalib
units
Using this file has been widely adopted by the community, because it makes it easier to document your dependencies. You can create as many requirements files as you want in a project, and have your users call the pip install -r requirements.txt
command to install the packages described in them.
For instance, you could have a dev-requirements.txt
file, which contains extra tools for development, and a prod-requirements.txt
, which has production-specific dependencies. The format allows inheritance to help you manage requirements files' collections.
Using the requirements
files duplicates some of the information contained in the setup.py
file's install_requires
section. As noted earlier, we could read in the requirements.txt
file and include the data in setup.py
. Some developers deliberately keep these sources separate to distinguish between an application and a library, allowing a library more flexibility in its dependencies in order to co-operate with other installed libraries. This does mean keeping two sources of information up to date, which is often a source of confusion.
As we said earlier in the chapter, we do not want to make our life complicated by having two different ways to describe Python project dependencies, since the distinction between an application and a library can be quite vague. To avoid duplicating the information in both places, there are some tools in the community that offer some syncing automation between setup.py
and requirements files.
The pip-tools
(https://github.com/jazzband/pip-tools) tool is one of these utilities, and it generates a requirements.txt
file (or any other filename) via a pip-compile
CLI, as follows:
$ pip install pip-tools
...
$ pip-compile
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile
#
aiofiles==0.6.0
# via quart
blinker==1.4
# via quart
click==7.1.2
# via quart
h11==0.12.0
# via
# hypercorn
# wsproto
…
With no other arguments, pip-compile
will examine setup.py
. It's also possible to pass it an unpinned version file, such as requirements.in
as a list of packages to use instead.
Notice that all the dependencies are pinned—the version we want is in the file. This is always a good idea in a production environment, as we want our application to be reproducible. If we do not specify a version to install, then we will get whatever is the latest, and that may break our application. By specifying the version, we know that all the tests we have run will still be valid no matter how far in the future we deploy that version of our app.
It's also a good idea to add the hash of the dependency to the requirements.txt
file, as this avoids any issue with someone uploading a package without updating the version number, or a malicious actor replacing an existing version of a package. These hashes will be compared to the downloaded files on installation, and are only used if they match:
$ pip-compile —generate-hashes
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile —generate-hashes
#
aiofiles==0.6.0
—hash=sha256:bd3019af67f83b739f8e4053c6c0512a7f545b9a8d91aaeab55e6e0f9d123c27
—hash=sha256:e0281b157d3d5d59d803e3f4557dcc9a3dff28a4dd4829a9ff478adae50ca092
# via quart
blinker==1.4
—hash=sha256:471aee25f3992bd325afa3772f1063dbdbbca947a041b8b89466dc00d606f8b6
# via quart
click==7.1.2
—hash=sha256:d2b5255c7c6349bc1bd1e59e08cd12acbbd63ce649f2588755783aa94dfb6b1a
—hash=sha256:dacca89f4bfadd5de3d7489b7c8a566eee0d3676333fbb50030263894c38c0dc
# via quart
If you don't use pip-tools
, pip has a built-in command called freeze
, which you can use to generate a list of all the current versions that are installed in your Python virtual environment. Using pip freeze
without a virtual environment is likely to result in a lot of packages that have been used for other projects, rather than just your own work:
$ pip freeze
aiofiles==0.6.0
blinker==1.4
click==7.1.2
h11==0.12.0
h2==4.0.0
hpack==4.0.0
…
...
The only problem when you pin your dependencies is when another project has the same dependencies, but is pinned with other versions. pip will complain and fail to meet both the requirements sets, and you will not be able to install everything. If you are producing a library, and you expect other people to use and add to their own list of dependencies, it is a good idea to specify a range of versions that you support, so that pip can try to sort out any dependency conflicts. For example:
quart>0.13.0,<0.15.0
It's also common practice to leave the dependencies unpinned in the setup.py
file and pin the requirements.txt
file. That way, pip can install the latest version for each package, and when you deploy, specifically in stage or production, you can refresh the versions by running the pip install -r requirements.txt
command. pip will then upgrade/downgrade all the dependencies to match the versions, and if you need to, you can tweak them in the requirements file.
To summarize, defining dependencies should be done in each project's setup.py
file, and requirements files can be provided with pinned dependencies if you have a reproducible process to generate them from the setup.py
file to avoid duplication.
The next useful file your projects could have is the MANIFEST.in
file.
When creating a source or binary release, Setuptools
will include all the package modules and data files, the setup.py
file, and a few other files automatically in the package archive. Files like pip requirements
will not be included. To add them to your distribution, you need to add a MANIFEST.in
file, which contains the list of files to include.
The file follows a simple glob-like syntax, described at the following, where you refer to a file or a directory pattern and say whether you want to include or prune the matches: https://docs.python.org/3/distutils/commandref.html#creating-a-source-distribution-the-sdist-command.
Here's an example from Jeeves:
include requirements.txt
include README.rst
include LICENSE
recursive-include myservice *.ini
recursive-include docs *.rst *.png *.svg *.css *.html conf.py
prune docs/build/*
The docs/directory
containing the Sphinx doc will be integrated in the source distribution, but any artifact generated locally in docs/build/
when the doc is built will be pruned.
Once you have the MANIFEST.in
file in place, all the files should be added in your distribution when your project is released.
A typical microservice project, as described in this book, will have the following list of files:
setup.py
: The setup fileREADME.rst
: The content of the long_description
optionMANIFEST.in
: The MANIFEST template if it is neededrequirements.txt
: pip requirement files generated from install_requires
docs/
: The Sphinx documentationsrc/
From there, releasing your project consists of creating a source distribution, which is basically an archive of this structure. If you have some C extensions, you can also create a binary distribution.
Before we learn how to create those releases, let's look at how to pick version numbers for your microservices.
Python packaging tools do not enforce a specific versioning pattern, although the version field should be one that can be converted using the packaging module into a meaningful version. Let's discuss what counts as a meaningful version number. To understand a versioning scheme, an installer needs to know how to sort and compare versions. The installer needs to be able to parse the string and know whether a version is older than another one.
Some software uses a scheme based on the date of release, like 20210101
if your software was released on January 1, 2021. For some use cases this works perfectly well. If you are practicing Continuous Deployment (CD), where every change that reaches the release branch is pushed to production, then there may be such a large number of changes that fixed version numbers are hard to work with. In that sort of situation, a date-based version, or a version from the version control hash, may work well.
Date- or commit-based versioning won't work very well if you do branched releases. For instance, if your software has a large change in behavior and you need to support the older version for a while as people transition, then having versions 1 and 2 makes things clear, but using dates in this situation will make some of your "version 1" releases appear as if they were more recent than some of the "version 2" releases, and confuse anyone trying to determine what they should install. Some software combines incremental versions and dates for that reason, but it became obvious that using dates was not the best way to handle branches.
There is also the problem of releasing beta, alpha, release candidates, and dev versions. Developers want to have the ability to mark releases as being pre-releases. For instance, when Python is about to ship a new version, it will ship release candidates using an rcX
marker so that the community can try it before the final release is shipped, for example, 3.10.0rc1
or 3.10.0rc2
.
For a microservice that you are not releasing to the community, using such markers is often unnecessary—but when you start to have people from outside your organization using your software, it may become useful.
Release candidates can be useful if you are about to ship a backward-incompatible version of a project. It's always a good idea to have your users try it out before it's published. For the usual release though, using candidate releases is probably overkill, as publishing a new release when a problem is found is cheap.
pip does a fairly good job of figuring out most patterns, ultimately falling back to some alphanumeric sorting, but the world would be a better place if all projects were using the same versioning scheme. PEP 386
, then 440
, was written to try to come up with a versioning scheme for the Python community. It's derived from the standard MAJOR.MINOR[.PATCH]
scheme, which is widely adopted among developers, with some specific rules for pre and post versions.
The Semantic Versioning (SemVer) (http://semver.org/) scheme is another standard that emerged in the community, which is used in many places outside Python. If you use SemVer, you will be compatible with PEP 440
and the pip installer as long as you don't use pre-release markers. For instance, 3.6.0rc2
translates to 3.6.0-rc2
in SemVer.
Unlike PEP 440
, SemVer asks that you always provide the three version numbers. For instance, 1.0
should be 1.0.0
. The python-semver
library will help a great deal with comparing different versions: https://github.com/python-semver/python-semver:
>>> import semver
>>> version1 = semver.parse_version_info('2.2.3-rc2')
>>> version2 = semver.parse_version_info('2.3.1')
>>> version1 < version2
True
For your microservice project, or any Python project for that matter, you should start with the 0.1.0
version to make it clear that it is not yet stable and may change drastically during early development, and that backward compatibility is not guaranteed. From there, you can increment the MINOR
number at will until you feel the software is mature enough.
Once maturity has been reached, a common pattern is to release 1.0.0
, and then start to follow these rules:
MAJOR
is incremented when you introduce a backward-incompatible change for the existing API.MINOR
is incremented when you add new features that do not break the existing API.PATCH
is incremented just for bug fixes.Being strict about this scheme with the 0.x.x
series when the software is in its early phase does not make much sense, because you will make a lot of backward-incompatible changes, and your MAJOR
version would reach a high number in no time.
The 1.0.0
release is often emotionally charged for developers. They want it to be the first stable release they will give to the world—that's why it's common to use the 0.x.x
versions and bump to 1.0.0
when the software is deemed stable.
For a library, what we call the API is all the public and documented functions and classes one may import and use. For a microservice, there's a distinction between the code API and the HTTP API. You may completely change the whole implementation in a microservice project and still implement the exact same HTTP API. You need to treat those two versions distinctly.
It's important to remember that version numbers are not decimals, or really any form of counting number, and so while it may look like the next version after 3.9
should be 4.0
, it does not have to be—3.10
and onward are perfectly acceptable. The numbers are simply a way to order the values and tell which is lower or greater than another.
Now that we know how to deal with version numbers, let's do some releasing.
To release your project, we must build a package that can be either uploaded to a package repository such as PyPI or installed directly wherever it is needed. Python has a build utility that makes this process straightforward.
In the following example, we install the build utility, and then run it in the example project we used earlier in this chapter. The output can be quite long, so only some of it is included below:
$ pip install --upgrade build
...
$ python -m build
...
running bdist_wheel
running build
installing to build/bdist.macosx-10.15-x86_64/wheel
running install
running install_egg_info
running egg_info
writing MyProject.egg-info/PKG-INFO
writing dependency_links to MyProject.egg-info/dependency_links.txt
writing requirements to MyProject.egg-info/requires.txt
writing top-level names to MyProject.egg-info/top_level.txt
reading manifest file 'MyProject.egg-info/SOURCES.txt'
writing manifest file 'MyProject.egg-info/SOURCES.txt'
Copying MyProject.egg-info to build/bdist.macosx-10.15-x86_64/wheel/MyProject-1.0.0-py3.8.egg-info
running install_scripts
creating build/bdist.macosx-10.15-x86_64/wheel/MyProject-1.0.0.dist-info/WHEEL
creating '/Users/simon/github/PythonMicroservices/CodeSamples/Chapter9/pyproject-example/dist/tmpcqfu71ms/MyProject-1.0.0-py3-none-any.whl' and adding 'build/bdist.macosx-10.15-x86_64/wheel' to it
adding 'MyProject-1.0.0.dist-info/METADATA'
adding 'MyProject-1.0.0.dist-info/WHEEL'
adding 'MyProject-1.0.0.dist-info/top_level.txt'
adding 'MyProject-1.0.0.dist-info/RECORD'
removing build/bdist.macosx-10.15-x86_64/wheel
The build
command reads the information from setup.py
and MANIFEST.in
, collects all the files, and puts them in an archive. The result is created in the dist
directory:
$ ls dist/
MyProject-1.0.0-py3-none-any.whl MyProject-1.0.0.tar.gz
Notice that the name of the archive is composed of the name of the project and its version. The archive is in the Wheel
format, defined in PEP 427
, which is currently the best format for distributing Python packages, although there have been different methods in the past, which you may encounter in existing projects. This archive can be used directly with pip to install the project as follows:
$ pip install dist/MyProject-1.0.0-py3-none-any.whl
Processing ./dist/MyProject-1.0.0-py3-none-any.whl
Collecting quart
Using cached Quart-0.15.1-py3-none-any.whl (89 kB)
Collecting hypercorn>=0.11.2
Using cached Hypercorn-0.11.2-py3-none-any.whl (54 kB)
Collecting itsdangerous
Using cached itsdangerous-2.0.1-py3-none-any.whl (18 kB)
…
Installing collected packages: hyperframe, hpack, h11, wsproto, priority, MarkupSafe, h2, werkzeug, jinja2, itsdangerous, hypercorn, click, blinker, aiofiles, quart, MyProject
Successfully installed MarkupSafe-2.0.1 MyProject-1.0.0 aiofiles-0.7.0 blinker-1.4 click-8.0.1 h11-0.12.0 h2-4.0.0 hpack-4.0.0 hypercorn-0.11.2 hyperframe-6.0.1 itsdangerous-2.0.1 jinja2-3.0.1 priority-2.0.0 quart-0.15.1 werkzeug-2.0.1 wsproto-1.0.0
Once you have your archive ready, it's time to distribute it.
If you are developing in an open-source project, it is good practice to publish your project to PyPI, so that it can be used by a wide range of people. This can be found at: https://pypi.python.org/pypi. If the project is private, or internal to a company, then you may have a package repository for your work that operates in a similar way to PyPI that is only visible to your own organization's infrastructure.
Like most modern language ecosystems, PYPI can be browsed by installers that are looking for releases to download. When you call the pip install <project>
command, pip will browse PyPI to see whether that project exists, and whether there are some suitable releases for your platform.
The public name is the name you use in your setup.py
file, and you need to register it at PyPI to be able to publish releases. The index uses the first-come, first-serve principle, so if the name you have picked is already taken, then you will have to choose another one.
When creating microservices for an application or organization, you can use a common prefix for all your projects' names. It is also possible to set up your own private version of PyPI for projects that should not be released to the wider world. If at all possible, though, it helps everyone to contribute to the open-source community.
At the package level, a prefix can also sometimes be useful to avoid conflicts. Python has a namespace package feature, which allows you to create a top-level package name (like jeeves
), and then have packages in separate Python projects, which will end up being installed under the top-level jeeves
package.
The effect is that every package gets a common jeeves
namespace when you import them, which is quite an elegant way to group your code under the same banner. The feature is available through the pkgutil
module from the standard library.
To do this, you just need to create the same top-level directory in every project, with the __init__.py
file, containing and prefixing all absolute imports with the top-level name:
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
For example, in Jeeves, if we decide to release everything under the same namespace, each project can have the same top-level package name. In the tokendealer
, it could be as follows:
jeeves
__init__.py
: Contains the extend_path
calltokendealer/
And then in the dataservice
directory, like this:
jeeves
extend_path
calldataservice/
Both will ship a jeeves
top-level namespace, and when pip installs them, the tokendealer
and dataservice
packages will both end up installed and available underneath the name jeeves
:
>>> from jeeves import tokendealer, dataservice
This feature is not that useful in production, where each microservice is deployed in a separate installation, but it does not hurt, and it can be useful if you start to create a lot of libraries that are used across projects. For now, we will make the assumption that each project is independent, and each name is available at PyPI.
To publish the releases at PyPI, you first need to register a new user using the form at https://pypi.org/account/register/, which will look like that shown in Figure 9.1.
Figure 9.1: Creating an account on PyPI
It's also worth registering at the test version of PyPI, as this will let you experiment with uploads and try out all the commands without publishing anything to the real index. Use https://test.pypi.org/account/register/ for an account on the test service.
Python Distutils
has a register
and upload
command to register a new project at PyPI, but it is better to use Twine
(https://github.com/pypa/twine), which comes with a better user interface. Once you've installed Twine
(using the pip install twine
command), the next step is to register your package using the following command:
$ twine register dist/jeeves-dataservice-0.1.0.tar.gz
Once done, you can go ahead and upload the releases. Let's upload to the test version of PyPI first, to make sure everything works. After the upload, we give pip some extra arguments so that it knows to use the test version of the PyPI version, and then to fall back to the real package index to sort out the other dependencies:
$ twine upload —repository testpypi dist/*
$ pip install —index-url https://test.pypi.org/simple/ —extra-index-url https://pypi.org/simple jeeves-dataservice
Once we know everything is working, we can upload to the real package index:
$ twine upload dist/*
From there, your package should appear in the index, with an HTML home page at https://pypi.python.org/pypi/<project>
. The pip install <project>
command should work!
Now that we know how to package each microservice, let us see how to run them all in the same box for development purposes.
So far, we have run our Quart
applications using the built-in Quart wrapper, or using the run()
function. This works well for development, as the app can detect changes to its source code and reload itself, saving time when making changes. However, there are limitations to this, not least of which is that this is running the server in a development mode, with extra diagnostics turned on that slow down the server's operation.
Instead, we should run our applications using Hypercorn
(https://pgjones.gitlab.io/hypercorn/), an ASGI web server that allows Quart
to run to its full potential, supporting HTTP/2
, HTTP/3
, as well as WebSocket
. It's already installed alongside Quart and is very straightforward to use. For our dataservice application, we would run:
$ hypercorn dataservice:app
Hypercorn is the latest in a mine of WSGI and ASGI servers that aim to serve web applications, and if you are searching the Flask documentation when looking into extensions, you may come across mention of Gunicorn
(https://gunicorn.org/
), as it is a common equivalent to Hypercorn for synchronous applications, using a worker pool model to provide concurrency, an option we discussed in Chapter 1, Understanding Microservices. For Quart
, though, we will stick with Hypercorn
.
The last piece of the puzzle is to avoid having to run each console script in a separate Bash window. We want to manage those processes with a single script. Let's see in the next section how we can do this with a process manager.
Hypercorn specializes in running web apps. If you want to deploy a development environment with a few other processes, you have to manage several different Python microservices, a RabbitMQ instance, a database, and whatever else you use. In order to make life easier in your development environment, you will need to use another process manager.
A good option is a tool like Circus
(http://circus.readthedocs.io), which can run any kind of process, even when they are not ASGI or WSGI applications. It also has the ability to bind sockets and make them available for the managed processes. In other words, Circus can run a Quart
app with several processes, and can also manage some other processes if needed.
Circus is a Python application, so, to use it, you can simply run the command pip install circus
. Once Circus is installed, it provides a few commands—through the entry_points
method described earlier. The two principal commands are circusd
, which is the process manager, and circusctl
, which lets you control the process manager from the command line. Circus uses an INI-like configuration file, where you can list the commands to run in dedicated sections—and, for each one of them, the number of processes you want to use.
Circus can also bind sockets, and let the forked process use them via their file descriptors. When a socket is created on your system, it uses a File Descriptor (FD), which is a system handle a program can use to reach a file or an I/O resource like sockets. A process that is forked from another one inherits all its file descriptors. That is, through this mechanism, all the processes launched by Circus can share the same sockets.
In the following example, two commands are being run. One will run five processes for the Quart application, located in the server.py
module, using the virtualenv
provided in the virtualenv
path, and the second command will run one Redis server process:
[watcher:web]
cmd = hypercorn —bind fd://$(circus.sockets.web) server:app
use_sockets = True
numprocesses = 5
virtualenv = ./venvs/circus-virtualenv/
copy_env = True
[watcher:redis]
cmd = /usr/local/bin/redis-server
use_sockets = False
numprocesses = 1
[socket:web]
host = 0.0.0.0
port = 8000
The socket:web
section describes what host and port to use to bind the TCP socket, and the watcher:web
section uses it via the $(circus.sockets.web)
variable. When Circus runs, it replaces that variable with the FD value for the socket. To run this script, you can use the circusd
command line:
$ circusd myconfig.ini
For our microservices, using Circus means we can simply create a watcher and a socket section per service and start them all using the circusd
command.
Circus also offers options to redirect the stdout
and stderr
streams to log files to facilitate debugging and numerous other features that can be found at https://circus.readthedocs.io/en/latest/for-ops/configuration/.
In this chapter, we have looked at how to package, release, and distribute each microservice. The current state of the art in Python packaging still requires some knowledge about the legacy tools, and this will be the case for some years until all the ongoing work in Python and PyPA
becomes mainstream. But, provided you have a standard, reproducible, and documented way to package and install your microservices, you should be fine.
Having numerous projects to run a single application adds a lot of complexity when you are developing it, and it's important to be able to run all pieces from within the same box. Tools like pip's development mode and Circus are useful for this, as it allows you to simplify how you run the whole stack—but they still require that you install tools on your system, even if it is inside a virtualenv
.
The other issue with running everything from your local computer is that you might not use an operating system that will be used to run your services in production, or you may have some libraries installed for other purposes, which might interfere.
The best way to prevent this problem is to run your stack in an isolated environment. This is what the next chapter will cover: how to run your services inside a container.