Chapter 5. Writing a Package

This chapter focuses on a repeatable process to write and release Python packages. Its intentions are:

  • To shorten the time needed to set up everything before starting the real work
  • To provide a standardized way to write packages
  • To ease the use of a test-driven development approach
  • To facilitate the releasing process

It is organized into the following four parts:

  • A common pattern for all packages that describes the similarities between all Python packages, and how distutils and setuptools play a central role
  • What namespace packages are and why they can be useful
  • How to register and upload packages in the Python Package Index (PyPI) with emphasis on security and common pitfalls
  • The stand-alone executables as an alternative way to package and distribute Python applications

Creating a package

Python packaging can be a bit overwhelming at first. The main reason for that is the confusion about proper tools for creating Python packages. Anyway, once you create your first package, you will see that this is not as hard as it looks. Also, knowing proper, state-of-the art packaging tools helps a lot.

You should know how to create packages even if you are not interested in distributing your code as open source. Knowing how to make your own will give you more insight into the packaging ecosystem and will help you to work with third-party code available on PyPI that you are probably using.

Also, having your closed source project or its components available as source distribution packages can help you to deploy your code in different environments. Advantages of leveraging the Python packaging ecosystem in code deployment will be described in more detail in the next chapter. Here we will focus on proper tools and techniques to create such distributions.

The confusing state of Python packaging tools

The state of Python packaging was very confusing for a long time and it took many years to bring organization to this topic. Everything started with the distutils package introduced in 1998 that was later enhanced by setuptools in 2003. These two projects started a long and knotted story of forks, alternative projects, and complete rewrites that tried to once and for all fix Python's packaging ecosystem. Unfortunately, most of these attempts never succeeded. The effect was quite the opposite. Each new project that aimed to supersede setuptools or distutils only added up to the already huge confusion around packaging tools. Some of such forks were merged back to their ancestors (like distribute that was a fork of setuptools) but some were left abandoned (like distutils2).

Fortunately, this state is gradually changing. An organization called Python Packaging Authority (PyPA) was formed to bring back the order and organization to the packaging ecosystem. Python Packaging User Guide (https://packaging.python.org), maintained by PyPA, is the authoritative source of information about the latest packaging tools and best practices. Treat it as the best source of information about packaging and a complementary reading to this chapter. The guide also contains a detailed history of changes and new projects related to packaging, so it will be useful if you already know a bit but want to make sure you still use the proper tools.

Stay away from other popular Internet resources, such as The Hitchhiker's Guide to Packaging. It is old, not maintained, and mostly obsolete. It may be interesting only for historical reasons and the Python Packaging User Guide is in fact a fork of this old resource.

The current landscape of Python packaging thanks to PyPA

PyPA, besides providing an authoritative guide for packaging, also maintains packaging projects and the standardization process for new official aspects of packaging. All of PyPA's projects can be found under a single organization on GitHub: https://github.com/pypa.

Some of them were already mentioned in the book. The most notable are:

  • pip
  • virtualenv
  • twine
  • warehouse

Note that most of them were started outside of this organization and only moved under PyPA patronage as mature and widespread solutions.

Thanks to PyPA engagement, the progressive abandoning of the eggs format in favor of wheels for built distributions is already happening. The future may bring us even more fresh breath. PyPA is actively working on warehouse, which aims to completely replace current PyPI implementations. This will be a huge step in packaging history because pypi is so old and neglected a project that only a few of us can imagine gradually improving it without a total rewrite.

Tool recommendations

Python Packaging User Guide gives a few suggestions on recommended tools for working with packages. They can be generally divided into two groups: tools for installing packages and tools for package creation and distribution.

Utilities from the first group recommended by PyPA were already mentioned in Chapter 1, Current Status of Python, but let's repeat them here for the sake of consistency:

  • Use pip for installing packages from PyPI
  • Use virtualenv or venv for application-level isolation of the Python environment

The Python Packaging User Guide recommendations of tools for package creation and distribution are as follows:

  • Use setuptools to define projects and create source distributions
  • Use wheels in favor of eggs to create built distributions
  • Use twine to upload package distributions to PyPI

Project configuration

It should be obvious that the easiest way to organize the code of big applications is to split it into several packages. This makes the code simpler, and easier to understand, maintain, and change. It also maximizes the reusability of each package. They act like components.

setup.py

The root directory of a package that has to be distributed contains a setup.py script. It defines all metadata as described in the distutils module, combined as arguments in a call to the standard setup() function. Despite distutils is a standard library module, it is recommended that you use the setuptools package instead, which provides several enhancements to the standard distutils.

Therefore, the minimum content for this file is:

from setuptools import setup

setup(
    name='mypackage',
)

name gives the full name of the package. From there, the script provides several commands that can be listed with the –-help-commands option:

$ python3 setup.py --help-commands
Standard commands:
  build             build everything needed to install
  clean             clean up temporary files from 'build' command
  install           install everything from build directory
  sdist             create a source distribution (tarball, zip file)
  register          register the distribution with the PyP
  bdist             create a built (binary) distribution
  check             perform some checks on the package
  upload            upload binary package to PyPI

Extra commands:
  develop           install package in 'development mode'
  alias             define a shortcut to invoke one or more commands
  test              run unit tests after in-place build
  bdist_wheel       create a wheel distribution

usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

The actual list of commands is longer and can vary depending on the available setuptools extensions. It was truncated to show only those that are most important and relevant to this chapter. Standard commands are the built-in commands provided by distutils, whereas extra commands are the ones created by third-party packages such as setuptools or any other package that defines and registers a new command. One such extra command registered by another package is bdist_wheel provided by the wheel package.

setup.cfg

The setup.cfg file contains default options for commands of the setup.py script. This is very useful if the process for building and distributing the package is more complex and requires many optional arguments to be passed to the setup.py commands. This allows you to store such default parameters in code on a per-project basis. This will make your distribution flow independent from the project and also provide transparency about how your package was built and distributed to the users and other team members.

The syntax for the setup.cfg file is the same as provided by the built-in configparser module so it is similar to the popular Microsoft Windows INI files. Here is an example of the setup configuration file that provides some global, sdist, and bdist_wheel command defaults:

[global]
quiet=1

[sdist]
formats=zip,tar

[bdist_wheel]
universal=1

This example configuration will ensure that source distributions will always be created with two formats (ZIP and TAR) and built wheel distributions will be created as universal wheels (Python version independent). Also, most of output will be suppressed on every command by the global quiet switch. Note that this is only for demonstration purposes and it may not be a reasonable choice to suppress the output for every command by default.

MANIFEST.in

When building a distribution with sdist command, distutils browses the package directory looking for files to include in the archive. distutils will include:

  • All Python source files implied by the py_modules, packages, and scripts options
  • All C source files listed in the ext_modules option

Files that match the glob pattern test/test*.py are: README, README.txt, setup.py, and setup.cfg.

Besides, if your package is under subversion or CVS, sdist will browse folders such as .svn to look for files to include. Integration with other version control systems is also possible through extensions. sdist builds a MANIFEST file that lists all files and includes them into the archive.

Let's say you are not using these version control systems, and need to include more files. Now you can define a template called MANIFEST.in in the same directory as that of setup.py for the MANIFEST file, where you indicate to sdist which files to include.

This template defines one inclusion or exclusion rule per line, for example:

include HISTORY.txt
include README.txt
include CHANGES.txt
include CONTRIBUTORS.txt
include LICENSE
recursive-include *.txt *.py

The full list of the MANIFEST.in commands can be found in official distutils documentation.

Most important metadata

Besides the name and the version of the package being distributed, the most important arguments setup can receive are:

  • description: This includes a few sentences to describe the package
  • long_description: This includes a full description that can be in reStructuredText
  • keywords: This is a list of keywords that define the package
  • author: This is the author's name or organization
  • author_email: This is the contact e-mail address
  • url: This is the URL of the project
  • license: This is the license (GPL, LGPL, and so on)
  • packages: This is a list of all names in the package; setuptools provides a small function called find_packages that calculates this
  • namespace_packages: This is a list of namespaced packages

Trove classifiers

PyPI and distutils provide a solution for categorizing applications with the set of classifiers called trove classifiers. All the classifiers form a tree-like structure. Each classifier is a form of string where every namespace is separated by the :: substring. Their list is provided to the package definition as a classifiers argument to the setup() function. Here is an example list of classifiers for some project available on PyPI (here solrq):

from setuptools import setup

setup(
    name="solrq",
    # (...)

    classifiers=[
        'Development Status :: 4 - Beta',
        'Intended Audience :: Developers',
        'License :: OSI Approved :: BSD License',
        'Operating System :: OS Independent',
        'Programming Language :: Python',
        'Programming Language :: Python :: 2',
        'Programming Language :: Python :: 2.6',
        'Programming Language :: Python :: 2.7',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.2',
        'Programming Language :: Python :: 3.3',
        'Programming Language :: Python :: 3.4',
        'Programming Language :: Python :: Implementation :: PyPy',
        'Topic :: Internet :: WWW/HTTP :: Indexing/Search',
    ],
)

They are completely optional in the package definition but provide a useful extension to the basic metadata available in the setup() interface. Among others, trove classifiers may provide information about supported Python versions or systems, the development stage of the project, or the license under which the code is released. Many PyPI users search and browse the available packages by categories so a proper classification helps packages to reach their target.

Trove classifiers serve an important role in the whole packaging ecosystem and should never be ignored. There is no organization that verifies packages classification, so it is your responsibility to provide proper classifiers for your packages and not introduce chaos to the whole package index.

At the time of writing this book, there are 608 classifiers available on PyPI that are grouped into nine major categories:

  • Development status
  • Environment
  • Framework
  • Intended audience
  • License
  • Natural language
  • Operating system
  • Programming language
  • Topic

New classifiers are added from time to time, so it is possible that these numbers will be different at the time you read it. The full list of currently available trove classifiers is available with the setup.py register --list-classifiers command.

Common patterns

Creating a package for distribution can be a tedious task for inexperienced developers. Most of the metadata that setuptools or distuitls accept in their setup() function call can be provided manually, ignoring the fact that this may be available in other parts of the project:

from setuptools import setup

setup(
    name="myproject",
    version="0.0.1",
    description="mypackage project short description",
    long_description="""
        Longer description of mypackage project
        possibly with some documentation and/or
        usage examples
    """,
    install_requires=[
        'dependency1',
        'dependency2',
        'etc',
    ]
)

While this will definitely work, it is hard to maintain in the long term and leaves a place for future mistakes and inconsistencies. Both setuptools and distutils cannot automatically pick various metadata information from the project sources, so you need to provide them by yourself. There are some common patterns among the Python community for solving the most popular problems like dependency management, version/readme inclusion, and so on. It is worth knowing at least a few of them because they are so popular that they could be considered as packaging idioms.

Automated inclusion of version string from package

The PEP 440 (Version Identification and Dependency Specification) document specifies a standard for version and dependency specification. It is a long document that covers accepted version specification schemes and how version matching and comparison in Python packaging tools should work. If you are using or plan to use a complex project version numbering scheme, then reading this document is obligatory. If you are using a simple scheme that consists of one, two, three, or more numbers separated by dots, then you can let go the reading of PEP 440. If you don't know how to choose the proper versioning scheme, I greatly recommend following semantic versioning that was already mentioned in Chapter 1, Current Status of Python.

The other problem is where to include that version specifier for a package or module. There is PEP 396 (Module Version Numbers), which deals exactly with this problem. Note that it is only informational and has deferred status, so it is not a part of the standards track. Anyway, it describes what seems to be a de facto standard now. According to PEP 396, if a package or module has a version specified, it should be included as a __version__ attribute of a package root (__init__.py) or module file. Another de facto standard is to also include the VERSION attribute that contains the tuple of version parts. This helps users to write compatibility code because such version tuples can be easily compared if the versioning scheme is simple enough.

So many packages available on PyPI follow both standards. Their __init__.py files contain version attributes that look like the following:

# version as tuple for simple comparisons
VERSION = (0, 1, 1)
# string created from tuple to avoid inconsistency
__version__ = ".".join([str(x) for x in VERSION])

The other suggestion of deferred PEP 396 is that the version provided in the distutils' setup() function should be derived from __version__, or vice versa. Python Packaging User Guide features multiple patterns for a single-sourcing project version and each of them has its own advantages and limitations. My personal favorite is rather long and is not included in the PyPA's guide but has the advantage of limiting the complexity to setup.py script only. This boiler plate assumes that the version specifier is provided by the VERSION attribute of package's __init__ module and extracts this data for inclusion in the setup() call. Here is the excerpt from some imaginary package's setup.py script that presents this approach:

from setuptools import setup
import os


def get_version(version_tuple):
    # additional handling of a,b,rc tags, this can
    # be simpler depending on your versioning scheme
    if not isinstance(version_tuple[-1], int):
        return '.'.join(
            map(str, version_tuple[:-1])
        ) + version_tuple[-1]

    return '.'.join(map(str, version_tuple))

# path to the packages __init__ module in project
# source tree
init = os.path.join(
    os.path.dirname(__file__), 'src', 'some_package', '__init__.py'
)

version_line = list(
    filter(lambda l: l.startswith('VERSION'), open(init))
)[0]

# VERSION is a tuple so we need to eval its line of code.
# We could simply import it from the package but we
# cannot be sure that this package is importable before
# finishing its installation
VERSION = get_version(eval(version_line.split('=')[-1]))

setup(
    name='some-package',
    version=VERSION,
    # ...
)

README file

Python Packaging Index can display a project's readme or the value of long_description on the package page in PyPI portal. You can write this description using reStructuredText (http://docutils.sourceforge.net/rst.html) markup, so it will be formatted to HTML on upload. Unfortunately, only reStructuredText is currently available as a documentation markup on PyPI. This is unlikely to change in the near future. More likely, additional markup languages will be supported when we see the warehouse project replacing completely current PyPI implementations. Unfortunately, the final release of warehouse is still unknown.

Still, many developers want to use different markup languages for various reasons. The most popular choice is Markdown, which is the default markup language on GitHub—the place where most open source Python development currently happens. So, usually, GitHub and Markdown enthusiasts either ignore this problem or provide two independent documentation texts. Descriptions provided to PyPI are either short versions of what is available on the project's GitHub page or it is plain unformatted Markdown that does not present well on PyPI.

If you want to use something different than reStructuredText markup language for your project's README, you can still provide it as a project description on the PyPI page in a readable form. The trick lies in using the pypandoc package to translate your other markup language into reStructuredText while uploading the package to Python Package Index. It is important to do it with a fallback to plain content of your readme file, so the installation won't fail if the user has no pypandoc installed:

try:
    from pypandoc import convert

    def read_md(f):
        return convert(f, 'rst')

except ImportError:
    convert = None
    print(
        "warning: pypandoc module not found, could not convert Markdown to RST"
    )

    def read_md(f):
        return open(f, 'r').read()  # noqa

README = os.path.join(os.path.dirname(__file__), 'README.md')


setup(
    name='some-package',
    long_description=read_md(README),
    # ...
)

Managing dependencies

Many projects require some external packages to be installed and/or used. When the list of dependencies is very long there comes a question as to how to manage it. The answer in most cases is very simple. Do not over-engineer the problem. Keep it simple and provide the list of dependencies explicitly in your setup.py script:

from setuptools import setup
setup(
    name='some-package',
    install_requires=['falcon', 'requests', 'delorean']
    # ...
)

Some Python developers like to use requirements.txt files for tracking lists of dependencies for their packages. In some situations, you might find a reason for doing that but in most cases this is a relic of times where the code of that project was not properly packaged. Anyway, even such notable projects as Celery still stick to this convention. So if you are not willing to change your habits or you are somehow forced to use requirement files, then at least do it properly. Here is one of the popular idioms for reading the list of dependencies from the requirements.txt file:

from setuptools import setup
import os


def strip_comments(l):
    return l.split('#', 1)[0].strip()


def reqs(*f):
    return list(filter(None, [strip_comments(l) for l in open(
        os.path.join(os.getcwd(), *f)).readlines()]))

setup(
    name='some-package',
    install_requires=reqs('requirements.txt')
    # ...
)

The custom setup command

distutils allows you to create new commands. A new command can be registered with an entry point, which was introduced by setuptools as a simple way to define packages as plug-ins.

An entry point is a named link to a class or a function that is made available through some APIs in setuptools. Any application can scan for all registered packages and use the linked code as a plug-in.

To link the new command, the entry_points metadata can be used in the setup call:

setup(
    name="my.command",
    entry_points="""
        [distutils.commands]
        my_command  = my.command.module.Class
    """
)

All named links are gathered in named sections. When distutils is loaded, it scans for links that were registered under distutils.commands.

This mechanism is used by numerous Python applications that provide extensibility.

Working with packages during development

Working with setuptools is mostly about building and distributing packages. However, you still need to know how to use them to install packages directly from project sources. And the reason for that is simple. It is good to test if your packaging code works properly before submitting a package to PyPI. And the simplest way to test it is by installing it. If you will send a broken package to the repository, then in order to re-upload it, you need to increase the version number.

Testing if your code is packaged properly before the final distribution saves you from unnecessary version number inflation and obviously from wasted time. Also, installation directly from your own sources using setuptools may be essential when working on multiple related packages at the same time.

setup.py install

The install command installs the package into Python environment. It will try to build the package if no previous build was made and then inject the result into the Python tree. When a source distribution is provided, it can be uncompressed in a temporary folder and then installed with this command. The install command will also install dependencies that are defined in the install_requires metadata. This is done by looking at the packages in the Python Package Index.

An alternative to the bare setup.py script when installing a package is to use pip. Since it is a tool that is recommended by PyPA, you should use it even when installing a package in your local environment for development purposes. In order to install a package from local sources, run the following command:

pip install <project-path>

Uninstalling packages

Amazingly, setuptools and distutils lack the uninstall command. Fortunately, it is possible to uninstall any Python package using pip:

pip uninstall <package-name>

Uninstalling can be a dangerous operation when attempted on system-wide packages. This is another reason why it is so important to use virtual environments for any development.

setup.py develop or pip -e

Packages installed with setup.py install are copied to the site-packages directory of your current environment. This means whenever you make a change to the sources of that package, you are required to re-install it. This is often a problem during intensive development because it is very easy to forget about the need to perform installation again. This is why setuptools provides an extra develop command that allows us to install packages in development mode. This command creates a special link to project sources in the deployment directory (site-packages) instead of copying the whole package there. Package sources can be edited without need of re-installation and it is available in sys.path as it were installed normally.

pip also allows installing packages in such a mode. This installation option is called editable mode and can be enabled with the -e parameter in the install command:

pip install -e <project-path>
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset