This chapter focuses on a repeatable process to write and release Python packages. Its intentions are:
It is organized into the following four parts:
distutils
and setuptools
play a central rolePython packaging can be a bit overwhelming at first. The main reason for that is the confusion about proper tools for creating Python packages. Anyway, once you create your first package, you will see that this is not as hard as it looks. Also, knowing proper, state-of-the art packaging tools helps a lot.
You should know how to create packages even if you are not interested in distributing your code as open source. Knowing how to make your own will give you more insight into the packaging ecosystem and will help you to work with third-party code available on PyPI that you are probably using.
Also, having your closed source project or its components available as source distribution packages can help you to deploy your code in different environments. Advantages of leveraging the Python packaging ecosystem in code deployment will be described in more detail in the next chapter. Here we will focus on proper tools and techniques to create such distributions.
The state of Python packaging was very confusing for a long time and it took many years to bring organization to this topic. Everything started with the distutils
package introduced in 1998 that was later enhanced by setuptools
in 2003. These two projects started a long and knotted story of forks, alternative projects, and complete rewrites that tried to once and for all fix Python's packaging ecosystem. Unfortunately, most of these attempts never succeeded. The effect was quite the opposite. Each new project that aimed to supersede setuptools
or distutils
only added up to the already huge confusion around packaging tools. Some of such forks were merged back to their ancestors (like distribute
that was a fork of setuptools
) but some were left abandoned (like distutils2
).
Fortunately, this state is gradually changing. An organization called Python Packaging Authority (PyPA) was formed to bring back the order and organization to the packaging ecosystem. Python Packaging User Guide (https://packaging.python.org), maintained by PyPA, is the authoritative source of information about the latest packaging tools and best practices. Treat it as the best source of information about packaging and a complementary reading to this chapter. The guide also contains a detailed history of changes and new projects related to packaging, so it will be useful if you already know a bit but want to make sure you still use the proper tools.
Stay away from other popular Internet resources, such as The Hitchhiker's Guide to Packaging. It is old, not maintained, and mostly obsolete. It may be interesting only for historical reasons and the Python Packaging User Guide is in fact a fork of this old resource.
PyPA, besides providing an authoritative guide for packaging, also maintains packaging projects and the standardization process for new official aspects of packaging. All of PyPA's projects can be found under a single organization on GitHub: https://github.com/pypa.
Some of them were already mentioned in the book. The most notable are:
pip
virtualenv
twine
warehouse
Note that most of them were started outside of this organization and only moved under PyPA patronage as mature and widespread solutions.
Thanks to PyPA engagement, the progressive abandoning of the eggs format in favor of wheels for built distributions is already happening. The future may bring us even more fresh breath. PyPA is actively working on warehouse
, which aims to completely replace current PyPI implementations. This will be a huge step in packaging history because pypi
is so old and neglected a project that only a few of us can imagine gradually improving it without a total rewrite.
Python Packaging User Guide gives a few suggestions on recommended tools for working with packages. They can be generally divided into two groups: tools for installing packages and tools for package creation and distribution.
Utilities from the first group recommended by PyPA were already mentioned in Chapter 1, Current Status of Python, but let's repeat them here for the sake of consistency:
pip
for installing packages from PyPIvirtualenv
or venv
for application-level isolation of the Python environmentThe Python Packaging User Guide recommendations of tools for package creation and distribution are as follows:
setuptools
to define projects and create source distributionstwine
to upload package distributions to PyPIIt should be obvious that the easiest way to organize the code of big applications is to split it into several packages. This makes the code simpler, and easier to understand, maintain, and change. It also maximizes the reusability of each package. They act like components.
The root directory of a package that has to be distributed contains a setup.py
script. It defines all metadata as described in the distutils
module, combined as arguments in a call to the standard setup()
function. Despite distutils
is a standard library module, it is recommended that you use the setuptools
package instead, which provides several enhancements to the standard distutils
.
Therefore, the minimum content for this file is:
from setuptools import setup setup( name='mypackage', )
name
gives the full name of the package. From there, the script provides several commands that can be listed with the –-help-commands
option:
$ python3 setup.py --help-commands Standard commands: build build everything needed to install clean clean up temporary files from 'build' command install install everything from build directory sdist create a source distribution (tarball, zip file) register register the distribution with the PyP bdist create a built (binary) distribution check perform some checks on the package upload upload binary package to PyPI Extra commands: develop install package in 'development mode' alias define a shortcut to invoke one or more commands test run unit tests after in-place build bdist_wheel create a wheel distribution usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help
The actual list of commands is longer and can vary depending on the available setuptools
extensions. It was truncated to show only those that are most important and relevant to this chapter. Standard
commands are the built-in commands provided by distutils
, whereas extra commands are the ones created by third-party packages such as setuptools
or any other package that defines and registers a new command. One such extra command registered by another package is bdist_wheel
provided by the wheel
package.
The setup.cfg
file contains default options for commands of the setup.py
script. This is very useful if the process for building and distributing the package is more complex and requires many optional arguments to be passed to the setup.py
commands. This allows you to store such default parameters in code on a per-project basis. This will make your distribution flow independent from the project and also provide transparency about how your package was built and distributed to the users and other team members.
The syntax for the setup.cfg
file is the same as provided by the built-in configparser
module so it is similar to the popular Microsoft Windows INI files. Here is an example of the setup configuration file that provides some global
, sdist
, and bdist_wheel
command defaults:
[global] quiet=1 [sdist] formats=zip,tar [bdist_wheel] universal=1
This example configuration will ensure that source distributions will always be created with two formats (ZIP and TAR) and built wheel distributions will be created as universal wheels (Python version independent). Also, most of output will be suppressed on every command by the global quiet
switch. Note that this is only for demonstration purposes and it may not be a reasonable choice to suppress the output for every command by default.
When building a distribution with sdist
command, distutils
browses the package directory looking for files to include in the archive. distutils
will include:
py_modules
, packages
, and scripts
optionsext_modules
optionFiles that match the glob pattern test/test*.py
are: README
, README.txt
, setup.py
, and setup.cfg
.
Besides, if your package is under subversion or CVS, sdist
will browse folders such as .svn
to look for files to include. Integration with other version control systems is also possible through extensions. sdist
builds a MANIFEST
file that lists all files and includes them into the archive.
Let's say you are not using these version control systems, and need to include more files. Now you can define a template called MANIFEST.in
in the same directory as that of setup.py
for the MANIFEST
file, where you indicate to sdist
which files to include.
This template defines one inclusion or exclusion rule per line, for example:
include HISTORY.txt include README.txt include CHANGES.txt include CONTRIBUTORS.txt include LICENSE recursive-include *.txt *.py
The full list of the MANIFEST.in
commands can be found in official distutils
documentation.
Besides the name and the version of the package being distributed, the most important arguments setup
can receive are:
description
: This includes a few sentences to describe the packagelong_description
: This includes a full description that can be in reStructuredTextkeywords
: This is a list of keywords that define the packageauthor
: This is the author's name or organizationauthor_email
: This is the contact e-mail addressurl
: This is the URL of the projectlicense
: This is the license (GPL, LGPL, and so on)packages
: This is a list of all names in the package; setuptools
provides a small function called find_packages
that calculates thisnamespace_packages
: This is a list of namespaced packagesPyPI and distutils
provide a solution for categorizing applications with the set of classifiers called trove classifiers. All the classifiers form a tree-like structure. Each classifier is a form of string where every namespace is separated by the ::
substring. Their list is provided to the package definition as a classifiers
argument to the setup()
function. Here is an example list of classifiers for some project available on PyPI (here solrq
):
from setuptools import setup setup( name="solrq", # (...) classifiers=[ 'Development Status :: 4 - Beta', 'Intended Audience :: Developers', 'License :: OSI Approved :: BSD License', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.2', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: Implementation :: PyPy', 'Topic :: Internet :: WWW/HTTP :: Indexing/Search', ], )
They are completely optional in the package definition but provide a useful extension to the basic metadata available in the setup()
interface. Among others, trove classifiers may provide information about supported Python versions or systems, the development stage of the project, or the license under which the code is released. Many PyPI users search and browse the available packages by categories so a proper classification helps packages to reach their target.
Trove classifiers serve an important role in the whole packaging ecosystem and should never be ignored. There is no organization that verifies packages classification, so it is your responsibility to provide proper classifiers for your packages and not introduce chaos to the whole package index.
At the time of writing this book, there are 608 classifiers available on PyPI that are grouped into nine major categories:
New classifiers are added from time to time, so it is possible that these numbers will be different at the time you read it. The full list of currently available trove classifiers is available with the setup.py register --list-classifiers
command.
Creating a package for distribution can be a tedious task for inexperienced developers. Most of the metadata that setuptools
or distuitls
accept in their setup()
function call can be provided manually, ignoring the fact that this may be available in other parts of the project:
from setuptools import setup setup( name="myproject", version="0.0.1", description="mypackage project short description", long_description=""" Longer description of mypackage project possibly with some documentation and/or usage examples """, install_requires=[ 'dependency1', 'dependency2', 'etc', ] )
While this will definitely work, it is hard to maintain in the long term and leaves a place for future mistakes and inconsistencies. Both setuptools
and distutils
cannot automatically pick various metadata information from the project sources, so you need to provide them by yourself. There are some common patterns among the Python community for solving the most popular problems like dependency management, version/readme inclusion, and so on. It is worth knowing at least a few of them because they are so popular that they could be considered as packaging idioms.
The PEP 440 (Version Identification and Dependency Specification) document specifies a standard for version and dependency specification. It is a long document that covers accepted version specification schemes and how version matching and comparison in Python packaging tools should work. If you are using or plan to use a complex project version numbering scheme, then reading this document is obligatory. If you are using a simple scheme that consists of one, two, three, or more numbers separated by dots, then you can let go the reading of PEP 440. If you don't know how to choose the proper versioning scheme, I greatly recommend following semantic versioning that was already mentioned in Chapter 1, Current Status of Python.
The other problem is where to include that version specifier for a package or module. There is PEP 396 (Module Version Numbers), which deals exactly with this problem. Note that it is only informational and has deferred status, so it is not a part of the standards track. Anyway, it describes what seems to be a de facto standard now. According to PEP 396, if a package or module has a version specified, it should be included as a __version__
attribute of a package root (__init__.py
) or module file. Another de facto standard is to also include the VERSION
attribute that contains the tuple of version parts. This helps users to write compatibility code because such version tuples can be easily compared if the versioning scheme is simple enough.
So many packages available on PyPI follow both standards. Their __init__.py
files contain version attributes that look like the following:
# version as tuple for simple comparisons VERSION = (0, 1, 1) # string created from tuple to avoid inconsistency __version__ = ".".join([str(x) for x in VERSION])
The other suggestion of deferred PEP 396 is that the version provided in the distutils' setup()
function should be derived from __version__
, or vice versa. Python Packaging User Guide features multiple patterns for a single-sourcing project version and each of them has its own advantages and limitations. My personal favorite is rather long and is not included in the PyPA's guide but has the advantage of limiting the complexity to setup.py
script only. This boiler plate assumes that the version specifier is provided by the VERSION
attribute of package's __init__
module and extracts this data for inclusion in the setup()
call. Here is the excerpt from some imaginary package's setup.py
script that presents this approach:
from setuptools import setup import os def get_version(version_tuple): # additional handling of a,b,rc tags, this can # be simpler depending on your versioning scheme if not isinstance(version_tuple[-1], int): return '.'.join( map(str, version_tuple[:-1]) ) + version_tuple[-1] return '.'.join(map(str, version_tuple)) # path to the packages __init__ module in project # source tree init = os.path.join( os.path.dirname(__file__), 'src', 'some_package', '__init__.py' ) version_line = list( filter(lambda l: l.startswith('VERSION'), open(init)) )[0] # VERSION is a tuple so we need to eval its line of code. # We could simply import it from the package but we # cannot be sure that this package is importable before # finishing its installation VERSION = get_version(eval(version_line.split('=')[-1])) setup( name='some-package', version=VERSION, # ... )
Python Packaging Index can display a project's readme or the value of long_description
on the package page in PyPI portal. You can write this description using reStructuredText (http://docutils.sourceforge.net/rst.html) markup, so it will be formatted to HTML on upload. Unfortunately, only reStructuredText is currently available as a documentation markup on PyPI. This is unlikely to change in the near future. More likely, additional markup languages will be supported when we see the warehouse
project replacing completely current PyPI implementations. Unfortunately, the final release of warehouse
is still unknown.
Still, many developers want to use different markup languages for various reasons. The most popular choice is Markdown, which is the default markup language on GitHub—the place where most open source Python development currently happens. So, usually, GitHub and Markdown enthusiasts either ignore this problem or provide two independent documentation texts. Descriptions provided to PyPI are either short versions of what is available on the project's GitHub page or it is plain unformatted Markdown that does not present well on PyPI.
If you want to use something different than reStructuredText markup language for your project's README, you can still provide it as a project description on the PyPI page in a readable form. The trick lies in using the pypandoc
package to translate your other markup language into reStructuredText while uploading the package to Python Package Index. It is important to do it with a fallback to plain content of your readme file, so the installation won't fail if the user has no pypandoc
installed:
try: from pypandoc import convert def read_md(f): return convert(f, 'rst') except ImportError: convert = None print( "warning: pypandoc module not found, could not convert Markdown to RST" ) def read_md(f): return open(f, 'r').read() # noqa README = os.path.join(os.path.dirname(__file__), 'README.md') setup( name='some-package', long_description=read_md(README), # ... )
Many projects require some external packages to be installed and/or used. When the list of dependencies is very long there comes a question as to how to manage it. The answer in most cases is very simple. Do not over-engineer the problem. Keep it simple and provide the list of dependencies explicitly in your setup.py
script:
from setuptools import setup setup( name='some-package', install_requires=['falcon', 'requests', 'delorean'] # ... )
Some Python developers like to use requirements.txt
files for tracking lists of dependencies for their packages. In some situations, you might find a reason for doing that but in most cases this is a relic of times where the code of that project was not properly packaged. Anyway, even such notable projects as Celery still stick to this convention. So if you are not willing to change your habits or you are somehow forced to use requirement files, then at least do it properly. Here is one of the popular idioms for reading the list of dependencies from the requirements.txt
file:
from setuptools import setup import os def strip_comments(l): return l.split('#', 1)[0].strip() def reqs(*f): return list(filter(None, [strip_comments(l) for l in open( os.path.join(os.getcwd(), *f)).readlines()])) setup( name='some-package', install_requires=reqs('requirements.txt') # ... )
distutils
allows you to create new commands. A new command can be registered with an entry point, which was introduced by setuptools
as a simple way to define packages as plug-ins.
An entry point is a named link to a class or a function that is made available through some APIs in setuptools
. Any application can scan for all registered packages and use the linked code as a plug-in.
To link the new command, the entry_points
metadata can be used in the setup call:
setup( name="my.command", entry_points=""" [distutils.commands] my_command = my.command.module.Class """ )
All named links are gathered in named sections. When distutils
is loaded, it scans for links that were registered under distutils.commands
.
This mechanism is used by numerous Python applications that provide extensibility.
Working with setuptools
is mostly about building and distributing packages. However, you still need to know how to use them to install packages directly from project sources. And the reason for that is simple. It is good to test if your packaging code works properly before submitting a package to PyPI. And the simplest way to test it is by installing it. If you will send a broken package to the repository, then in order to re-upload it, you need to increase the version number.
Testing if your code is packaged properly before the final distribution saves you from unnecessary version number inflation and obviously from wasted time. Also, installation directly from your own sources using setuptools
may be essential when working on multiple related packages at the same time.
The install
command installs the package into Python environment. It will try to build the package if no previous build was made and then inject the result into the Python tree. When a source distribution is provided, it can be uncompressed in a temporary folder and then installed with this command. The install
command will also install dependencies that are defined in the install_requires
metadata. This is done by looking at the packages in the Python Package Index.
An alternative to the bare setup.py
script when installing a package is to use pip
. Since it is a tool that is recommended by PyPA, you should use it even when installing a package in your local environment for development purposes. In order to install a package from local sources, run the following command:
pip install <project-path>
Amazingly, setuptools
and distutils
lack the uninstall
command. Fortunately, it is possible to uninstall any Python package using pip
:
pip uninstall <package-name>
Uninstalling can be a dangerous operation when attempted on system-wide packages. This is another reason why it is so important to use virtual environments for any development.
Packages installed with
setup.py install
are copied to the site-packages directory of your current environment. This means whenever you make a change to the sources of that package, you are required to re-install it. This is often a problem during intensive development because it is very easy to forget about the need to perform installation again. This is why setuptools
provides an extra develop
command that allows us to install packages in development mode. This command creates a special link to project sources in the deployment directory (site-packages) instead of copying the whole package there. Package sources can be edited without need of re-installation and it is available in sys.path
as it were installed normally.
pip
also allows installing packages in such a mode. This installation option is called editable mode and can be enabled with the -e
parameter in the install
command:
pip install -e <project-path>