9

Packaging and Running Python

When the Python programming language was first released in the early 1990s, a Python application was run by pointing the Python scripts to the interpreter. Everything related to packaging, releasing, and distributing Python projects was done manually. There was no real standard back then, and each project had a long README on how to install it with all its dependencies.

Bigger projects used system packaging tools to release their work—whether it was Debian packages, RPM packages for Red Hat Linux distributions, or MSI packages under Windows. Eventually, the Python modules from those projects all ended up in the site-packages directory of the Python installation, sometimes after a compilation phase, if they had a C extension.

The Python packaging ecosystem has evolved a lot since then. In 1998, Distutils was added to the standard library to provide essential support to create installable distributions for Python projects. Since then, a lot of new tools have emerged from the community to improve how a Python project can be packaged, released, and distributed. This chapter is going to explain how to use the latest Python packaging tools for your microservices.

The other hot topic around packaging is how it fits in with your day-to-day work. When building microservices-based software, you need to deal with many moving parts. When you are working in a particular microservice, you can get away with using the TDD and mocking approach most of the time, which we discussed in Chapter 3, Coding, Testing, and Documentation: the Virtuous Cycle.

However, if you want to do some realistic testing, and examine all the parts of the system, you need the whole stack running either locally or on a test cloud instance. Moreover, developing in such a context can be tedious if you need to reinstall new versions of your microservices all the time. This leads to one question in particular: how can you correctly install the whole stack in your environment and develop in it?

It also means you have to run all the microservices if you want to play with the app. In the case of Jeeves, having to open multiple different shells to run all the microservices is not something a developer would want to do every time they need to run the app.

In this chapter, we are going to look at how we can leverage the packaging tools to run all microservices from the same environment, and then how to run them all from a single Command-Line Interface (CLI) by using a dedicated process manager. First, however, we will look at how to package your projects, and which tools should be utilized.

The packaging toolchain

Python has come a long way since the days of those early packaging methods. Numerous Python Enhancement Proposals (PEPs) were written to improve how to install, release, and distribute Python projects.

Distutils had some flaws that made it a little tedious to release software. The biggest pain points were its lack of dependency management and the way it handled compilation and binary releases. For everything related to compiling, what worked well in the nineties started to get old-fashioned ten years later. No one in the core team made the library evolve due to lack of interest, and also because Distutils was good enough to compile Python and most projects. People who needed advanced toolchains used other tools, like SCons (http://scons.org/).

In any case, improving the toolchain was not an easy task because of the existing legacy system based on Distutils. Starting a new packaging system from scratch was quite hard, since Distutils was part of the standard library, but introducing backward-compatible changes was also hard to do properly. The improvements were made in between. Projects like Setuptools and virtualenv were created outside the standard library, and some changes were made directly in Python.

As of the time of writing, you still find the scars from these changes, and it is still quite hard to know exactly how things should be done. For instance, the pyvenv command was added in early versions of Python 3 and then removed in Python 3.6, but Python still ships with its virtual environment module, although there are also tools such as virtualenv to help make life easier.

The best bet is to use the tools that are developed and maintained outside the standard library, because their release cycle is shorter than Python's. In other words, a change in the standard library takes months to be released, whereas a change in a third-party project can be made available much faster. All third-party projects that are considered as being part of the de facto standard packaging toolchain are now all grouped under the PyPA (https://www.pypa.io) umbrella project.

Besides developing the tools, PyPA also works on improving the packaging standards through proposing PEPs for Python and developing its early specifications—refer to https://www.pypa.io/en/latest/roadmap/. There are often new tools and experiments in packaging and dependency management that let us learn new things whether or not they become popular. For this chapter, we will stick with the core, well-known tools.

Before we start to look at the tools that should be used, we need to go through a few definitions to avoid any confusion.

A few definitions

When we talk about packaging Python projects, a few terms can be confusing, because their definitions have evolved over time, and also because they can mean slightly different things outside the Python world. We need to define a Python package, a Python project, a Python library, and a Python application. They are defined as follows:

  • A Python package is a directory tree containing Python modules. You can import it, and it is part of the module namespace.
  • A Python project can contain several packages, modules, and other resources and is what you release. Each microservice you build with Flask is a Python project.
  • A Python application is a Python project that can be directly used through a user interface. The user interface can be a command-line script or a web server.
  • Lastly, a Python library is a specific kind of Python project that provides features to be used in other Python projects and has no direct end-user interface.

The distinction between an application and a library can be quite vague, since some libraries sometimes offer some command-line tools to use some of their features, even if the first use case is to provide Python packages for other projects. Moreover, sometimes, a project that was a library becomes an application.

To simplify the process, the best option is to make no distinction between applications and libraries. The only technical difference is that applications ship with more data files and console scripts.

Now that we have defined the terminology around a Python package, project, application, and library, let's look at how projects are packaged.

Packaging

When you package your Python project, there are three standard files you need to have alongside your Python packages:

  • pyproject.toml: A configuration file for the project's build system
  • setup.py or setup.cfg: A special module that controls packaging and metadata about the project
  • requirements.txt: A file listing dependencies

Let's look at each one of them in detail.

The setup.py file

The setup.py file is what governs everything when you want to interact with a Python project. When the setup() function is executed, it generates a static metadata file that follows the PEP 314 format. The metadata file holds all the metadata for the project, but you need to regenerate it via a setup() call to get it to the Python environment you are using.

The reason why you cannot use a static version is that the author of a project might have platform-specific code in setup.py, which generates a different metadata file depending on the platform and Python versions. Relying on running a Python module to extract static information about a project has always been a problem. You need to make sure that the code in the module can run in the target Python interpreter. If you are going to make your microservices available to the community, you need to keep that in mind, as the installation happens in many different Python environments.

A common mistake when creating the setup.py file is to import your package into it when you have third-party dependencies. If a tool like pip tries to read the metadata by running setup.py, it might raise an import error before it has a chance to list all the dependencies to install. The only dependency you can afford to import directly in your setup.py file is Setuptools, because you can assume that anyone trying to install your project is likely to have it in their environment.

Another important consideration is the metadata you want to include to describe your project. Your project can work with just a name, a version, a URL, and an author, but this is obviously not enough information to describe your project. Metadata fields are set through setup() arguments. Some of them match directly with the name of the metadata, while some do not.

The following is a useful set of arguments you could use for your microservices projects:

  • name: The name of the package; should be short and lowercase
  • version: The version of the project, as defined in PEP 440
  • url: A URL for the project; can be its repository or home page
  • description: One sentence to describe the project
  • long_description: A reStructuredText or Markdown document
  • author and author_email: The name and email of the author—can be an organization
  • license: The license used for the project (MIT, Apache2, GPL, and so on)
  • classifiers: A list of classifiers picked from a fixed list, as defined in PEP 301
  • keywords: Tags to describe your project—this is useful if you publish the project to the Python Package Index (PyPI)
  • packages: A list of packages that your project includes—Setuptools can populate that option automatically with the find_packages() method
  • entry_points: A list of Setuptools hooks, like console scripts (this is a Setuptools option)
  • include_package_data: A flag that simplifies the inclusion of non-Python files
  • zip_safe: A flag that prevents Setuptools from installing the project as a ZIP file, which is a historical standard (executable eggs)

If you are missing any critical options, then Setuptools will provide information about the ones it needs when you try to use it. The following is an example of a setup.py file that includes these options:

    from setuptools import setup, find_packages
 
    with open("README.rst") as f:
        LONG_DESC = f.read()
 
    setup(
        name="MyProject",
        version="1.0.0",
        url="http://example.com",
        description="This is a cool microservice based on Quart.",
        long_description=LONG_DESC,
        long_description_content_type="text/x-rst",
        author="Simon",
        author_email="[email protected]",
        license="MIT",
        classifiers=[
            "Development Status :: 3 - Alpha",
            "License :: OSI Approved :: MIT License",
            "Programming Language :: Python :: 3",
        ],
        keywords=["quart", "microservice"],
        packages=find_packages(),
        include_package_data=True,
        zip_safe=False,
        install_requires=["quart"],
    ) 

The long_description option is usually pulled from a README.rst file, so you do not have to deal with including a large piece of reStructuredText string in your function.

The Twine project (https://pypi.org/project/twine/)—which we will use later to upload packages to PyPI—has a check command to ensure the long description can be rendered properly. Adding this check to Continuous Integration (CI) as part of a standard test suite is a good idea, to ensure the documentation on PyPI is readable. The other benefit of separating the description is that it's automatically recognized, parsed, and displayed by most editors. For instance, GitHub uses it as your project landing page in your repository, while also offering an inline reStructuredText editor to change it directly from the browser. PyPI does the same to display the front page of the project.

The license field is freeform, as long as people can recognize the license being used. https://choosealicense.com/ offers impartial advice about which open-source software license is most appropriate for you, if you plan to release the source code—and you should strongly consider it, as our progress through this book and the myriad of tools used have all been based on open-source projects, and adding more to the community helps everyone involved. In any case, you should add, alongside your setup.py file, a LICENCE file with the official text of that license. In open-source projects it is common practice now to also include a "Code Of Conduct," such as the Contributor Covenant: https://www.contributor-covenant.org/.

This is because working with people from around the world involves many different cultures and expectations, and being open about the nature of the community is another aspect that helps everyone.

The classifiers option is probably the most painful one to write. You need to use strings from https://pypi.python.org/pypi?%3Aaction=list_classifiers that classify your project. The three most common classifiers that developers use are the list of supported Python versions, the license (which duplicates and should match the license option), and the development status, which is a hint about the maturity of the project.

The Trove classifier is machine-parsable metadata that can be used by tools interacting with PyPI. For example, the zc.buildout tool looks for packages with the Framework :: Buildout :: Recipe classifier. A list of valid classifiers is available at https://pypi.org/classifiers/.

Keywords are a good way to make your project visible if you publish it to the Python Package Index. For instance, if you are creating a Quart microservice, you should use "quart" and "microservice" as keywords.

The entry_points section is an INI-like string that defines ways to interact with your Python module through callables—most commonly a console script. When you add functions in that section, a command-line script will be installed alongside the Python interpreter, and the function hooked to it via the entry point. This is a good way to create a CLI for your project. In the example, mycli should be directly reachable in the shell when the project is installed. Lastly, install_requires lists all the dependencies. It's a list of Python projects the project uses and can be used by projects like pip when the installation occurs. The tool will grab them if they are published in PyPI and install them. It is also possible to read the dependencies from the file we will be discussing next, requirements.txt, and to read the version from a separate text file—or JSON file—so that the version can be easily used in multiple places if it's needed in the release pipeline. Since the JSON module is part of the standard library, there is no extra dependency added by importing it.

Once this setup.py file is created, a good way to try it is by creating a local virtual environment.

Assuming you have virtualenv installed, and you run these commands in the directory containing the setup.py file, it will create a few directories, including a bin directory containing a local Python interpreter, and drop you into a local shell:

$ python3 –m venv ./my-project-venv 
$ source ./my-project-venv/bin/activate 
(my-project-venv) $ 

There are several helper tools to make managing your virtual environments easier, such as virtualenvwrapper (https://virtualenvwrapper.readthedocs.io/en/latest/), but we will keep to the core functionality with our examples.

From here, running the pip install -e command will install the project in editable mode. This command installs the project by reading its setup file, but unlike install, the installation occurs in-place. Installing in-place means that you will be able to work directly on the Python modules in the project, and they will be linked to the local Python installation via its site-packages directory.

Using a regular install call would have created copies of the files in the local site-packages directory, and changing the source code would have had no impact on the installed version.

The pip call also generates a MyProject.egg-info directory, which contains the metadata. pip generates version 1.1 of the metadata spec under the PKG-INFO name:

$ more MyProject.egg-info/PKG-INFO
Metadata-Version: 2.1
Name: MyProject
Version: 1.0.0
Summary: This is a cool microservice based on Quart.
Home-page: http://example.com
Author: Simon
Author-email: [email protected]
License: MIT
Description: long description!
 
Keywords: quart,microservice
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/x-rst

This metadata file is what describes your project, and is used to register it to PyPI via other commands, as we will see later in the chapter.

The pip call also pulls all the project dependencies by looking for them in PyPI on https://pypi.python.org/pypi and installs them in the local site-packages. Running this command is a good way to make sure everything works as expected.

The requirements.txt file

One standard that emerged from the pip community is to use a requirements.txt file, which lists all the project dependencies, but also proposes an extended syntax to install editable dependencies. Refer to https://pip.pypa.io/en/stable/cli/pip_install/#requirements-file-format.

The following is an example of such a file:

arrow 
python-dateutil 
pytz 
requests 
six 
stravalib 
units 

Using this file has been widely adopted by the community, because it makes it easier to document your dependencies. You can create as many requirements files as you want in a project, and have your users call the pip install -r requirements.txt command to install the packages described in them.

For instance, you could have a dev-requirements.txt file, which contains extra tools for development, and a prod-requirements.txt, which has production-specific dependencies. The format allows inheritance to help you manage requirements files' collections.

Using the requirements files duplicates some of the information contained in the setup.py file's install_requires section. As noted earlier, we could read in the requirements.txt file and include the data in setup.py. Some developers deliberately keep these sources separate to distinguish between an application and a library, allowing a library more flexibility in its dependencies in order to co-operate with other installed libraries. This does mean keeping two sources of information up to date, which is often a source of confusion.

As we said earlier in the chapter, we do not want to make our life complicated by having two different ways to describe Python project dependencies, since the distinction between an application and a library can be quite vague. To avoid duplicating the information in both places, there are some tools in the community that offer some syncing automation between setup.py and requirements files.

The pip-tools (https://github.com/jazzband/pip-tools) tool is one of these utilities, and it generates a requirements.txt file (or any other filename) via a pip-compile CLI, as follows:

$ pip install pip-tools 
... 
$ pip-compile 
#
# This file is autogenerated by pip-compile
# To update, run:
#
#	pip-compile
#
aiofiles==0.6.0
	# via quart
blinker==1.4
	# via quart
click==7.1.2
	# via quart
h11==0.12.0
	# via
	#   hypercorn
	#   wsproto

With no other arguments, pip-compile will examine setup.py. It's also possible to pass it an unpinned version file, such as requirements.in as a list of packages to use instead.

Notice that all the dependencies are pinned—the version we want is in the file. This is always a good idea in a production environment, as we want our application to be reproducible. If we do not specify a version to install, then we will get whatever is the latest, and that may break our application. By specifying the version, we know that all the tests we have run will still be valid no matter how far in the future we deploy that version of our app.

It's also a good idea to add the hash of the dependency to the requirements.txt file, as this avoids any issue with someone uploading a package without updating the version number, or a malicious actor replacing an existing version of a package. These hashes will be compared to the downloaded files on installation, and are only used if they match:

$ pip-compile —generate-hashes
#
# This file is autogenerated by pip-compile
# To update, run:
#
#	pip-compile —generate-hashes
#
aiofiles==0.6.0 
	—hash=sha256:bd3019af67f83b739f8e4053c6c0512a7f545b9a8d91aaeab55e6e0f9d123c27 
	—hash=sha256:e0281b157d3d5d59d803e3f4557dcc9a3dff28a4dd4829a9ff478adae50ca092
	# via quart
blinker==1.4 
	—hash=sha256:471aee25f3992bd325afa3772f1063dbdbbca947a041b8b89466dc00d606f8b6
	# via quart
click==7.1.2 
	—hash=sha256:d2b5255c7c6349bc1bd1e59e08cd12acbbd63ce649f2588755783aa94dfb6b1a 
	—hash=sha256:dacca89f4bfadd5de3d7489b7c8a566eee0d3676333fbb50030263894c38c0dc
	# via quart

If you don't use pip-tools, pip has a built-in command called freeze, which you can use to generate a list of all the current versions that are installed in your Python virtual environment. Using pip freeze without a virtual environment is likely to result in a lot of packages that have been used for other projects, rather than just your own work:

$ pip freeze
aiofiles==0.6.0
blinker==1.4
click==7.1.2
h11==0.12.0
h2==4.0.0
hpack==4.0.0
…
... 

The only problem when you pin your dependencies is when another project has the same dependencies, but is pinned with other versions. pip will complain and fail to meet both the requirements sets, and you will not be able to install everything. If you are producing a library, and you expect other people to use and add to their own list of dependencies, it is a good idea to specify a range of versions that you support, so that pip can try to sort out any dependency conflicts. For example:

quart>0.13.0,<0.15.0

It's also common practice to leave the dependencies unpinned in the setup.py file and pin the requirements.txt file. That way, pip can install the latest version for each package, and when you deploy, specifically in stage or production, you can refresh the versions by running the pip install -r requirements.txt command. pip will then upgrade/downgrade all the dependencies to match the versions, and if you need to, you can tweak them in the requirements file.

To summarize, defining dependencies should be done in each project's setup.py file, and requirements files can be provided with pinned dependencies if you have a reproducible process to generate them from the setup.py file to avoid duplication.

The next useful file your projects could have is the MANIFEST.in file.

The MANIFEST.in file

When creating a source or binary release, Setuptools will include all the package modules and data files, the setup.py file, and a few other files automatically in the package archive. Files like pip requirements will not be included. To add them to your distribution, you need to add a MANIFEST.in file, which contains the list of files to include.

The file follows a simple glob-like syntax, described at the following, where you refer to a file or a directory pattern and say whether you want to include or prune the matches: https://docs.python.org/3/distutils/commandref.html#creating-a-source-distribution-the-sdist-command.

Here's an example from Jeeves:

include requirements.txt 
include README.rst 
include LICENSE 
recursive-include myservice *.ini 
recursive-include docs *.rst *.png *.svg *.css *.html conf.py 
prune docs/build/* 

The docs/directory containing the Sphinx doc will be integrated in the source distribution, but any artifact generated locally in docs/build/ when the doc is built will be pruned.

Once you have the MANIFEST.in file in place, all the files should be added in your distribution when your project is released.

A typical microservice project, as described in this book, will have the following list of files:

  • setup.py: The setup file
  • README.rst: The content of the long_description option
  • MANIFEST.in: The MANIFEST template if it is needed
  • A code of conduct, if the code is an open-source project
  • requirements.txt: pip requirement files generated from install_requires
  • docs/: The Sphinx documentation
  • A directory containing the microservice code, which will typically be named after the microservice, or src/

From there, releasing your project consists of creating a source distribution, which is basically an archive of this structure. If you have some C extensions, you can also create a binary distribution.

Before we learn how to create those releases, let's look at how to pick version numbers for your microservices.

Versioning

Python packaging tools do not enforce a specific versioning pattern, although the version field should be one that can be converted using the packaging module into a meaningful version. Let's discuss what counts as a meaningful version number. To understand a versioning scheme, an installer needs to know how to sort and compare versions. The installer needs to be able to parse the string and know whether a version is older than another one.

Some software uses a scheme based on the date of release, like 20210101 if your software was released on January 1, 2021. For some use cases this works perfectly well. If you are practicing Continuous Deployment (CD), where every change that reaches the release branch is pushed to production, then there may be such a large number of changes that fixed version numbers are hard to work with. In that sort of situation, a date-based version, or a version from the version control hash, may work well.

Date- or commit-based versioning won't work very well if you do branched releases. For instance, if your software has a large change in behavior and you need to support the older version for a while as people transition, then having versions 1 and 2 makes things clear, but using dates in this situation will make some of your "version 1" releases appear as if they were more recent than some of the "version 2" releases, and confuse anyone trying to determine what they should install. Some software combines incremental versions and dates for that reason, but it became obvious that using dates was not the best way to handle branches.

There is also the problem of releasing beta, alpha, release candidates, and dev versions. Developers want to have the ability to mark releases as being pre-releases. For instance, when Python is about to ship a new version, it will ship release candidates using an rcX marker so that the community can try it before the final release is shipped, for example, 3.10.0rc1 or 3.10.0rc2.

For a microservice that you are not releasing to the community, using such markers is often unnecessary—but when you start to have people from outside your organization using your software, it may become useful.

Release candidates can be useful if you are about to ship a backward-incompatible version of a project. It's always a good idea to have your users try it out before it's published. For the usual release though, using candidate releases is probably overkill, as publishing a new release when a problem is found is cheap.

pip does a fairly good job of figuring out most patterns, ultimately falling back to some alphanumeric sorting, but the world would be a better place if all projects were using the same versioning scheme. PEP 386, then 440, was written to try to come up with a versioning scheme for the Python community. It's derived from the standard MAJOR.MINOR[.PATCH] scheme, which is widely adopted among developers, with some specific rules for pre and post versions.

The Semantic Versioning (SemVer) (http://semver.org/) scheme is another standard that emerged in the community, which is used in many places outside Python. If you use SemVer, you will be compatible with PEP 440 and the pip installer as long as you don't use pre-release markers. For instance, 3.6.0rc2 translates to 3.6.0-rc2 in SemVer.

Unlike PEP 440, SemVer asks that you always provide the three version numbers. For instance, 1.0 should be 1.0.0. The python-semver library will help a great deal with comparing different versions: https://github.com/python-semver/python-semver:

>>> import semver
>>> version1 = semver.parse_version_info('2.2.3-rc2')
>>> version2 = semver.parse_version_info('2.3.1')
>>> version1 < version2
    True

For your microservice project, or any Python project for that matter, you should start with the 0.1.0 version to make it clear that it is not yet stable and may change drastically during early development, and that backward compatibility is not guaranteed. From there, you can increment the MINOR number at will until you feel the software is mature enough.

Once maturity has been reached, a common pattern is to release 1.0.0, and then start to follow these rules:

  • MAJOR is incremented when you introduce a backward-incompatible change for the existing API.
  • MINOR is incremented when you add new features that do not break the existing API.
  • PATCH is incremented just for bug fixes.

Being strict about this scheme with the 0.x.x series when the software is in its early phase does not make much sense, because you will make a lot of backward-incompatible changes, and your MAJOR version would reach a high number in no time.

The 1.0.0 release is often emotionally charged for developers. They want it to be the first stable release they will give to the world—that's why it's common to use the 0.x.x versions and bump to 1.0.0 when the software is deemed stable.

For a library, what we call the API is all the public and documented functions and classes one may import and use. For a microservice, there's a distinction between the code API and the HTTP API. You may completely change the whole implementation in a microservice project and still implement the exact same HTTP API. You need to treat those two versions distinctly.

It's important to remember that version numbers are not decimals, or really any form of counting number, and so while it may look like the next version after 3.9 should be 4.0, it does not have to be—3.10 and onward are perfectly acceptable. The numbers are simply a way to order the values and tell which is lower or greater than another.

Now that we know how to deal with version numbers, let's do some releasing.

Releasing

To release your project, we must build a package that can be either uploaded to a package repository such as PyPI or installed directly wherever it is needed. Python has a build utility that makes this process straightforward.

In the following example, we install the build utility, and then run it in the example project we used earlier in this chapter. The output can be quite long, so only some of it is included below:

$ pip install --upgrade build 
...
$ python -m build
...
running bdist_wheel
running build
installing to build/bdist.macosx-10.15-x86_64/wheel
running install
running install_egg_info
running egg_info
writing MyProject.egg-info/PKG-INFO
writing dependency_links to MyProject.egg-info/dependency_links.txt
writing requirements to MyProject.egg-info/requires.txt
writing top-level names to MyProject.egg-info/top_level.txt
reading manifest file 'MyProject.egg-info/SOURCES.txt'
writing manifest file 'MyProject.egg-info/SOURCES.txt'
Copying MyProject.egg-info to build/bdist.macosx-10.15-x86_64/wheel/MyProject-1.0.0-py3.8.egg-info
running install_scripts
creating build/bdist.macosx-10.15-x86_64/wheel/MyProject-1.0.0.dist-info/WHEEL
creating '/Users/simon/github/PythonMicroservices/CodeSamples/Chapter9/pyproject-example/dist/tmpcqfu71ms/MyProject-1.0.0-py3-none-any.whl' and adding 'build/bdist.macosx-10.15-x86_64/wheel' to it
adding 'MyProject-1.0.0.dist-info/METADATA'
adding 'MyProject-1.0.0.dist-info/WHEEL'
adding 'MyProject-1.0.0.dist-info/top_level.txt'
adding 'MyProject-1.0.0.dist-info/RECORD'
removing build/bdist.macosx-10.15-x86_64/wheel 

The build command reads the information from setup.py and MANIFEST.in, collects all the files, and puts them in an archive. The result is created in the dist directory:

 $ ls dist/
MyProject-1.0.0-py3-none-any.whl	MyProject-1.0.0.tar.gz

Notice that the name of the archive is composed of the name of the project and its version. The archive is in the Wheel format, defined in PEP 427, which is currently the best format for distributing Python packages, although there have been different methods in the past, which you may encounter in existing projects. This archive can be used directly with pip to install the project as follows:

$ pip install dist/MyProject-1.0.0-py3-none-any.whl
Processing ./dist/MyProject-1.0.0-py3-none-any.whl
Collecting quart
  Using cached Quart-0.15.1-py3-none-any.whl (89 kB)
Collecting hypercorn>=0.11.2
  Using cached Hypercorn-0.11.2-py3-none-any.whl (54 kB)
Collecting itsdangerous
  Using cached itsdangerous-2.0.1-py3-none-any.whl (18 kB) 
…
Installing collected packages: hyperframe, hpack, h11, wsproto, priority, MarkupSafe, h2, werkzeug, jinja2, itsdangerous, hypercorn, click, blinker, aiofiles, quart, MyProject
Successfully installed MarkupSafe-2.0.1 MyProject-1.0.0 aiofiles-0.7.0 blinker-1.4 click-8.0.1 h11-0.12.0 h2-4.0.0 hpack-4.0.0 hypercorn-0.11.2 hyperframe-6.0.1 itsdangerous-2.0.1 jinja2-3.0.1 priority-2.0.0 quart-0.15.1 werkzeug-2.0.1 wsproto-1.0.0

Once you have your archive ready, it's time to distribute it.

Distributing

If you are developing in an open-source project, it is good practice to publish your project to PyPI, so that it can be used by a wide range of people. This can be found at: https://pypi.python.org/pypi. If the project is private, or internal to a company, then you may have a package repository for your work that operates in a similar way to PyPI that is only visible to your own organization's infrastructure.

Like most modern language ecosystems, PYPI can be browsed by installers that are looking for releases to download. When you call the pip install <project> command, pip will browse PyPI to see whether that project exists, and whether there are some suitable releases for your platform.

The public name is the name you use in your setup.py file, and you need to register it at PyPI to be able to publish releases. The index uses the first-come, first-serve principle, so if the name you have picked is already taken, then you will have to choose another one.

When creating microservices for an application or organization, you can use a common prefix for all your projects' names. It is also possible to set up your own private version of PyPI for projects that should not be released to the wider world. If at all possible, though, it helps everyone to contribute to the open-source community.

At the package level, a prefix can also sometimes be useful to avoid conflicts. Python has a namespace package feature, which allows you to create a top-level package name (like jeeves), and then have packages in separate Python projects, which will end up being installed under the top-level jeeves package.

The effect is that every package gets a common jeeves namespace when you import them, which is quite an elegant way to group your code under the same banner. The feature is available through the pkgutil module from the standard library.

To do this, you just need to create the same top-level directory in every project, with the __init__.py file, containing and prefixing all absolute imports with the top-level name:

from pkgutil import extend_path 
__path__ = extend_path(__path__, __name__) 

For example, in Jeeves, if we decide to release everything under the same namespace, each project can have the same top-level package name. In the tokendealer, it could be as follows:

  • jeeves
    • __init__.py: Contains the extend_path call
    • tokendealer/
    • ... the actual code...

And then in the dataservice directory, like this:

  • jeeves
    • __init__.py: Contains the extend_path call
    • dataservice/
    • ... the actual code...

Both will ship a jeeves top-level namespace, and when pip installs them, the tokendealer and dataservice packages will both end up installed and available underneath the name jeeves:

>>> from jeeves import tokendealer, dataservice

This feature is not that useful in production, where each microservice is deployed in a separate installation, but it does not hurt, and it can be useful if you start to create a lot of libraries that are used across projects. For now, we will make the assumption that each project is independent, and each name is available at PyPI.

To publish the releases at PyPI, you first need to register a new user using the form at https://pypi.org/account/register/, which will look like that shown in Figure 9.1.

image1.png

Figure 9.1: Creating an account on PyPI

It's also worth registering at the test version of PyPI, as this will let you experiment with uploads and try out all the commands without publishing anything to the real index. Use https://test.pypi.org/account/register/ for an account on the test service.

Python Distutils has a register and upload command to register a new project at PyPI, but it is better to use Twine (https://github.com/pypa/twine), which comes with a better user interface. Once you've installed Twine (using the pip install twine command), the next step is to register your package using the following command:

$ twine register dist/jeeves-dataservice-0.1.0.tar.gz 

Once done, you can go ahead and upload the releases. Let's upload to the test version of PyPI first, to make sure everything works. After the upload, we give pip some extra arguments so that it knows to use the test version of the PyPI version, and then to fall back to the real package index to sort out the other dependencies:

$ twine upload —repository testpypi dist/*
$ pip install —index-url https://test.pypi.org/simple/ —extra-index-url https://pypi.org/simple jeeves-dataservice

Once we know everything is working, we can upload to the real package index:

$ twine upload dist/* 

From there, your package should appear in the index, with an HTML home page at https://pypi.python.org/pypi/<project>. The pip install <project> command should work!

Now that we know how to package each microservice, let us see how to run them all in the same box for development purposes.

Running all microservices

So far, we have run our Quart applications using the built-in Quart wrapper, or using the run() function. This works well for development, as the app can detect changes to its source code and reload itself, saving time when making changes. However, there are limitations to this, not least of which is that this is running the server in a development mode, with extra diagnostics turned on that slow down the server's operation.

Instead, we should run our applications using Hypercorn (https://pgjones.gitlab.io/hypercorn/), an ASGI web server that allows Quart to run to its full potential, supporting HTTP/2, HTTP/3, as well as WebSocket. It's already installed alongside Quart and is very straightforward to use. For our dataservice application, we would run:

$ hypercorn dataservice:app

Hypercorn is the latest in a mine of WSGI and ASGI servers that aim to serve web applications, and if you are searching the Flask documentation when looking into extensions, you may come across mention of Gunicorn (https://gunicorn.org/), as it is a common equivalent to Hypercorn for synchronous applications, using a worker pool model to provide concurrency, an option we discussed in Chapter 1, Understanding Microservices. For Quart, though, we will stick with Hypercorn.

The last piece of the puzzle is to avoid having to run each console script in a separate Bash window. We want to manage those processes with a single script. Let's see in the next section how we can do this with a process manager.

Process management

Hypercorn specializes in running web apps. If you want to deploy a development environment with a few other processes, you have to manage several different Python microservices, a RabbitMQ instance, a database, and whatever else you use. In order to make life easier in your development environment, you will need to use another process manager.

A good option is a tool like Circus (http://circus.readthedocs.io), which can run any kind of process, even when they are not ASGI or WSGI applications. It also has the ability to bind sockets and make them available for the managed processes. In other words, Circus can run a Quart app with several processes, and can also manage some other processes if needed.

Circus is a Python application, so, to use it, you can simply run the command pip install circus. Once Circus is installed, it provides a few commands—through the entry_points method described earlier. The two principal commands are circusd, which is the process manager, and circusctl, which lets you control the process manager from the command line. Circus uses an INI-like configuration file, where you can list the commands to run in dedicated sections—and, for each one of them, the number of processes you want to use.

Circus can also bind sockets, and let the forked process use them via their file descriptors. When a socket is created on your system, it uses a File Descriptor (FD), which is a system handle a program can use to reach a file or an I/O resource like sockets. A process that is forked from another one inherits all its file descriptors. That is, through this mechanism, all the processes launched by Circus can share the same sockets.

In the following example, two commands are being run. One will run five processes for the Quart application, located in the server.py module, using the virtualenv provided in the virtualenv path, and the second command will run one Redis server process:

[watcher:web]
cmd = hypercorn —bind fd://$(circus.sockets.web) server:app
use_sockets = True
numprocesses = 5
virtualenv = ./venvs/circus-virtualenv/
copy_env = True 
[watcher:redis] 
cmd = /usr/local/bin/redis-server 
use_sockets = False 
numprocesses = 1 
 
[socket:web] 
host = 0.0.0.0 
port = 8000 

The socket:web section describes what host and port to use to bind the TCP socket, and the watcher:web section uses it via the $(circus.sockets.web) variable. When Circus runs, it replaces that variable with the FD value for the socket. To run this script, you can use the circusd command line:

$ circusd myconfig.ini 

For our microservices, using Circus means we can simply create a watcher and a socket section per service and start them all using the circusd command.

Circus also offers options to redirect the stdout and stderr streams to log files to facilitate debugging and numerous other features that can be found at https://circus.readthedocs.io/en/latest/for-ops/configuration/.

Summary

In this chapter, we have looked at how to package, release, and distribute each microservice. The current state of the art in Python packaging still requires some knowledge about the legacy tools, and this will be the case for some years until all the ongoing work in Python and PyPA becomes mainstream. But, provided you have a standard, reproducible, and documented way to package and install your microservices, you should be fine.

Having numerous projects to run a single application adds a lot of complexity when you are developing it, and it's important to be able to run all pieces from within the same box. Tools like pip's development mode and Circus are useful for this, as it allows you to simplify how you run the whole stack—but they still require that you install tools on your system, even if it is inside a virtualenv.

The other issue with running everything from your local computer is that you might not use an operating system that will be used to run your services in production, or you may have some libraries installed for other purposes, which might interfere.

The best way to prevent this problem is to run your stack in an isolated environment. This is what the next chapter will cover: how to run your services inside a container.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset