Your own package index or index mirror

There are three main reasons why you might want to run your own index of Python packages:

  • The official Python Package Index does not have any availability guarantees. It is run by Python Software Foundation thanks to numerous donations. Because of that, it very often means that this site can be down. You don't want to stop your deployment or packaging process in the middle due to PyPI outage.
  • It is useful to have reusable components written in Python properly packaged even for the closed source that will never be published publicly. It simplifies the code base because packages that are used across the company for different projects do not need to be vendored. You can simply install them from the repository. This simplifies maintenance for such shared code and might reduce development costs for the whole company if it has many teams working on different projects.
  • It is very good practice to have your entire project packaged using setuptools. Then, deployment of the new application version is often as simple as running pip install --update my-application.

    Tip

    Code vendoring

    Code vendoring is a practice of including sources of the external package in the source code (repository) of other projects. It is usually done when the project's code depends on a specific version of some external package that may also be required by other packages (and in a completely different version). For instance, the popular requests package vendors some version of urllib3 in its source tree because it is very tightly coupled to it and is also very unlikely to work with any other version of urllib3. An example of a module that is particularly often vendored by others is six. It can be found in sources of numerous popular projects such as Django (django.utils.six), Boto (boto.vedored.six), or Matplotlib (matplotlib.externals.six).

    Although vendoring is practiced even by some large and successful open source projects, it should be avoided if possible. This has justifiable usage only in certain circumstances and should not be treated as a substitute for package dependency management.

PyPI mirroring

The problem of PyPI outages can be somehow mitigated by allowing the installation tools to download packages from one of its mirrors. In fact, the official Python Package Index is already served through CDN (Content Delivery Network), so it is intrinsically mirrored. This does not change the fact that it seems to have some bad days from time to time when any attempt to download a package fails. Using unofficial mirrors is not a solution here because it might raise some security concerns.

The best solution is to have your own PyPI mirror that will have all the packages you need. The only party that will use it is you, so it will be much easier to ensure proper availability. The other advantage is that whenever this service gets down, you don't need to rely on someone else to bring it up. The mirroring tool maintained and recommended by PyPA is bandersnatch (https://pypi.python.org/pypi/bandersnatch). It allows you to mirror the whole content of Python Package Index and it can be provided as the index-url option for the repository section in the .pypirc file (as explained in the previous chapter). This mirror does not accept uploads and does not have the web part of PyPI. Anyway, beware! A full mirror might require hundreds of gigabytes of storage and its size will continue to grow over time.

But why stop on a simple mirror while we have a much better alternative? There is a very small chance that you will require a mirror of the whole package index. Even with a project that has hundreds of dependencies, it will be only a minor fraction of all the available packages. Also, not being able to upload your own private package is a huge limitation of such a simple mirror. It seems that the added value of using bandersnatch is very low for such a high price. And this is true in most situations. If the package mirror is to be maintained only for single of few projects, a much better approach is to use devpi (http://doc.devpi.net/). It is a PyPI-compatible package index implementation that provides both:

  • A private index to upload nonpublic packages
  • Index mirroring

The main advantage of devpi over bandersnatch is how it handles mirroring. It can of course do a full generic mirror of other indexes, like bandersnatch does, but it is not its default behavior. Instead of doing rather expensive backup of the whole repository, it maintains mirrors for packages that were already requested by clients. So whenever a package is requested by the installation tool (pip, setuptools, and easyinstall), if it does not exist in the local mirror, the devpi server will attempt to download it from the mirrored index (usually PyPI) and serve. Once the package is downloaded, the devpi will periodically check for its updates to maintain a fresh state of its mirror.

The mirroring approach leaves a slight risk of failure when you request a new package that has not yet been mirrored and the upstream package index has an outage. Anyway, this risk is reduced thanks to the fact that in most deploys you will depend only on packages that were already mirrored in the index. The mirror state for packages that were already requested has eventual consistency with PyPI and new versions will be downloaded automatically. This seems to be a very reasonable tradeoff.

Deployment using a package

Modern web applications have a lot of dependencies and often require a lot of steps to properly install on the remote host. For instance, the typical bootstrapping process for a new version of the application on a remote host consists of the following steps:

  • Create new virtual environment for isolation
  • Move the project code to the execution environment
  • Install the latest project requirements (usually from the requirements.txt file)
  • Synchronize or migrate the database schema
  • Collect static files from project sources and external packages to the desired location
  • Compile localization files for applications available in different languages

For more complex sites, there might be lot of additional tasks mostly related to frontend code:

  • Generate CSS files using preprocessors such as SASS or LESS
  • Perform minification, obfuscation, and/or concatenation of static files (JavaScript and CSS files)
  • Compile code written in JavaScript superset languages (CoffeeScript, TypeScript, and so on) to native JS
  • Preprocess response template files (minification, style inlining, and so on)

All of these steps can be easily automated using tools such as Bash, Fabric, or Ansible but it is not a good idea to do everything on remote hosts where the application is being installed. Here are the reasons:

  • Some of the popular tools for processing static assets can be either CPU- or memory-intensive. Running them in production environments can destabilize your application execution.
  • These tools very often will require additional system dependencies that may not be required for normal operation of your projects. These are mostly additional runtime environments such as JVM, Node, or Ruby. This adds complexity to configuration management and increases the overall maintenance costs.
  • If you are deploying your application to multiple servers (tenths, hundredths, thousands), you are simply repeating a lot of work that could be done once. If you have your own infrastructure, then you may not experience a huge increase in costs, especially if you perform deployments in periods of low traffic. But if you run cloud computing services in the pricing model that charges you extra for spikes in load or generally for execution time, then this additional cost may be substantial on a proper scale.
  • Most of these steps just take a lot of time. You are installing your code on a remote server, so the last thing you want is to have your connection interrupted by some network issue. By keeping the deployment process quick, you are lowering the chance of deploy interruption.

For obvious reasons, the results of the mentioned deployment steps can't be included in your application code repository. Simply, there are things that must be done with every release and you can't change that. It is obviously a place for proper automation but the clue is to do it in the right place and at the right time.

Most of the things such as static collection and code/asset preprocessing can be done locally or in a dedicated environment, so the actual code that is deployed to the remote server requires only a minimal amount of on-site processing. The most notable deployment steps either in the process of building a distribution or installing a package are:

  • Installation of Python dependencies and transferring static assets (CSS files and JavaScript) to the desired location can be handled as a part of the install command of the setup.py script
  • Preprocessing code (processing JavaScript supersets, minification/obfuscation/concatenation of assets, and running SASS or LESS) and things such as localized text compilation (for example, compilemessages in Django) can be a part of the sdist/bdist command of the setup.py script

Inclusion of preprocessed code other than Python can be easily handled with the proper MANIFEST.in file. Dependencies are of course best provided as an install_requires argument of the setup() function call from the setuptools package.

Packaging the whole application of course will require some additional work from you like providing your own custom setuptools commands or overriding the existing ones, but gives you a lot of advantages and makes project deployment a lot faster and reliable.

Let's use a Django-based project (in Django 1.9 version) as an example. I have chosen this framework because it seems to be the most popular Python project of this type, so there is a high chance that you already know it a bit. A typical structure of files in such a project might look like the following:

$ tree . -I __pycache__ --dirsfirst
.
├── webxample
│   ├── conf
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   ├── locale
│   │   ├── de
│   │   │   └── LC_MESSAGES
│   │   │       └── django.po
│   │   ├── en
│   │   │   └── LC_MESSAGES
│   │   │       └── django.po
│   │   └── pl
│   │       └── LC_MESSAGES
│   │           └── django.po
│   ├── myapp
│   │   ├── migrations
│   │   │   └── __init__.py
│   │   ├── static
│   │   │   ├── js
│   │   │   │   └── myapp.js
│   │   │   └── sass
│   │   │       └── myapp.scss
│   │   ├── templates
│   │   │   ├── index.html
│   │   │   └── some_view.html
│   │   ├── __init__.py
│   │   ├── admin.py
│   │   ├── apps.py
│   │   ├── models.py
│   │   ├── tests.py
│   │   └── views.py
│   ├── __init__.py
│   └── manage.py
├── MANIFEST.in
├── README.md
└── setup.py

15 directories, 23 files

Note that this slightly differs from the usual Django project template. By default, the package that contains the WSGI application, the settings module, and the URL configuration has the same name as the project. Because we decided to take the packaging approach, this would be named webxample. This can cause some confusion, so it is better to rename it conf.

Without digging into the possible implementation details, let's just make a few simple assumptions:

  • Our example application has some external dependencies. Here, it will be two popular Django packages: djangorestframework and django-allauth, plus one non-Django package: gunicorn.
  • djangorestframework and django-allauth are provided as INSTALLED_APPS in the webexample.webexample.settings module.
  • The application is localized in three languages (German, English, and Polish) but we don't want to store the compiled gettext messages in the repository.
  • We are tired of vanilla CSS syntax, so we decided to use the more powerful SCSS language that we translate to CSS using SASS.

Knowing the structure of the project, we can write our setup.py script in a way that make setuptools handle:

  • Compilation of SCSS files under webxample/myapp/static/scss
  • Compilation of gettext messages under webexample/locale from .po to .mo format
  • Installation of requirements
  • A new script that provides an entry point to the package, so we will have the custom command instead of the manage.py script

We have a bit of luck here. Python binding for libsass, a C/C++ port of SASS engine, provides a handful integration with setuptools and distutils. With only little configuration, it provides a custom setup.py command for running the SASS compilation:

from setuptools import setup

setup(
    name='webxample',
    setup_requires=['libsass >= 0.6.0'],
    sass_manifests={
        'webxample.myapp': ('static/sass', 'static/css')
    },
)

So instead of running the sass command manually or executing a subprocess in the setup.py script we can type python setup.py build_scss and have our SCSS files compiled to CSS. This is still not enough. It makes our life a bit easier but we want the whole distribution to be fully automated so there is only one step for creating new releases. To achieve this goal, we are forced to override a bit some of the existing setuptools distribution commands.

The example setup.py file that handles some of the project preparation steps through packaging might look like this:

import os

from setuptools import setup
from setuptools import find_packages
from distutils.cmd import Command
from distutils.command.build import build as _build

try:
    from django.core.management.commands.compilemessages 
        import Command as CompileCommand
except ImportError:
    # note: during installation django may not be available
    CompileCommand = None


# this environment is requires
os.environ.setdefault(
    "DJANGO_SETTINGS_MODULE", "webxample.conf.settings"
)


class build_messages(Command):
    """ Custom command for building gettext messages in Django
    """
    description = """compile gettext messages"""
    user_options = []

    def initialize_options(self):
        pass

    def finalize_options(self):

        pass

    def run(self):
        if CompileCommand:
            CompileCommand().handle(
                verbosity=2, locales=[], exclude=[]
            )
        else:
            raise RuntimeError("could not build translations")


class build(_build):
    """ Overriden build command that adds additional build steps
    """
    sub_commands = [
        ('build_messages', None),
        ('build_sass', None),
    ] + _build.sub_commands


setup(
    name='webxample',
    setup_requires=[
        'libsass >= 0.6.0',
        'django >= 1.9.2',
    ],
    install_requires=[
        'django >= 1.9.2',
        'gunicorn == 19.4.5',
        'djangorestframework == 3.3.2',
        'django-allauth == 0.24.1',
    ],
    packages=find_packages('.'),
    sass_manifests={
        'webxample.myapp': ('static/sass', 'static/css')
    },
    cmdclass={
        'build_messages': build_messages,
        'build': build,
    },
    entry_points={
        'console_scripts': {
            'webxample = webxample.manage:main',
        }
    }
)

With such an implementation, we can build all assets and create source distribution of a package for the webxample project using this single terminal command:

$ python setup.py build sdist

If you already have your own package index (created with devpi) you can add the install subcommand or use twine so this package will be available for installation with pip in your organization. If we look into a structure of source distribution created with our setup.py script, we can see that it contains the compiled gettext messages and CSS style sheets generated from SCSS files:

$ tar -xvzf dist/webxample-0.0.0.tar.gz 2> /dev/null
$ tree webxample-0.0.0/ -I __pycache__ --dirsfirst
webxample-0.0.0/
├── webxample
│   ├── conf
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   ├── locale
│   │   ├── de
│   │   │   └── LC_MESSAGES
│   │   │       ├── django.mo
│   │   │       └── django.po
│   │   ├── en
│   │   │   └── LC_MESSAGES
│   │   │       ├── django.mo
│   │   │       └── django.po
│   │   └── pl
│   │       └── LC_MESSAGES
│   │           ├── django.mo
│   │           └── django.po
│   ├── myapp
│   │   ├── migrations
│   │   │   └── __init__.py
│   │   ├── static
│   │   │   ├── css
│   │   │   │   └── myapp.scss.css
│   │   │   └── js
│   │   │       └── myapp.js
│   │   ├── templates
│   │   │   ├── index.html
│   │   │   └── some_view.html
│   │   ├── __init__.py
│   │   ├── admin.py
│   │   ├── apps.py
│   │   ├── models.py
│   │   ├── tests.py
│   │   └── views.py
│   ├── __init__.py
│   └── manage.py
├── webxample.egg-info
│   ├── PKG-INFO
│   ├── SOURCES.txt
│   ├── dependency_links.txt
│   ├── requires.txt
│   └── top_level.txt
├── MANIFEST.in
├── PKG-INFO
├── README.md
├── setup.cfg
└── setup.py

16 directories, 33 files

The additional benefit of using this approach is that we were able to provide our own entry point for the project in place of Django's default manage.py script. Now we can run any Django management command using this entry point, for instance:

$ webxample migrate
$ webxample collectstatic
$ webxample runserver

This required a little change in the manage.py script for compatibility with the entry_points argument in setup(), so the main part of its code is wrapped with the main() function call:

#!/usr/bin/env python3
import os
import sys


def main():
    os.environ.setdefault(
        "DJANGO_SETTINGS_MODULE", "webxample.conf.settings"
    )

    from django.core.management import execute_from_command_line

    execute_from_command_line(sys.argv)


if __name__ == "__main__":
    main()

Unfortunately, a lot of frameworks (including Django) are not designed with the idea of packaging your projects that way in mind. It means that depending on the advancement of your application, converting it to a package may require a lot of changes. In Django, this often means rewriting many of the implicit imports and updating a lot of configuration variables in your settings file.

The other problem is the consistency of releases created using Python packaging. If different team members are authorized to create application distribution, it is crucial that this process takes place in the same replicable environment, especially when you do a lot of asset preprocessing; it is possible that the package created in two different environments will not look the same even if created from the same code base. This may be due to different version of tools used during the build process. The best practice is to move the distribution responsibility to a continuous integration/delivery system such as Jenkins or Buildbot. The additional advantage is that you can assert that the package passes all required tests before going to distribution. You can even make the automated deployment as a part of such continuous delivery system.

Despite this, distributing your code as Python packages using setuptools is not simple and effortless; it will greatly simplify your deployments, so it is definitely worth trying. Note that this is also in line with the detailed recommendation of the sixth rule in the Twelve-Factor App: execute the app as one or more stateless processes (http://12factor.net/processes).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset