There are three main reasons why you might want to run your own index of Python packages:
setuptools
. Then, deployment of the new application version is often as simple as running pip install --update my-application
.Code vendoring
Code vendoring is a practice of including sources of the external package in the source code (repository) of other projects. It is usually done when the project's code depends on a specific version of some external package that may also be required by other packages (and in a completely different version). For instance, the popular requests
package vendors some version of urllib3
in its source tree because it is very tightly coupled to it and is also very unlikely to work with any other version of urllib3
. An example of a module that is particularly often vendored by others is six
. It can be found in sources of numerous popular projects such as Django (django.utils.six
), Boto (boto.vedored.six
), or Matplotlib (matplotlib.externals.six
).
Although vendoring is practiced even by some large and successful open source projects, it should be avoided if possible. This has justifiable usage only in certain circumstances and should not be treated as a substitute for package dependency management.
The problem of PyPI outages can be somehow mitigated by allowing the installation tools to download packages from one of its mirrors. In fact, the official Python Package Index is already served through CDN (Content Delivery Network), so it is intrinsically mirrored. This does not change the fact that it seems to have some bad days from time to time when any attempt to download a package fails. Using unofficial mirrors is not a solution here because it might raise some security concerns.
The best solution is to have your own PyPI mirror that will have all the packages you need. The only party that will use it is you, so it will be much easier to ensure proper availability. The other advantage is that whenever this service gets down, you don't need to rely on someone else to bring it up. The mirroring tool maintained and recommended by PyPA is
bandersnatch (https://pypi.python.org/pypi/bandersnatch). It allows you to mirror the whole content of Python Package Index and it can be provided as the index-url
option for the repository section in the .pypirc
file (as explained in the previous chapter). This mirror does not accept uploads and does not have the web part of PyPI. Anyway, beware! A full mirror might require hundreds of gigabytes of storage and its size will continue to grow over time.
But why stop on a simple mirror while we have a much better alternative? There is a very small chance that you will require a mirror of the whole package index. Even with a project that has hundreds of dependencies, it will be only a minor fraction of all the available packages. Also, not being able to upload your own private package is a huge limitation of such a simple mirror. It seems that the added value of using bandersnatch is very low for such a high price. And this is true in most situations. If the package mirror is to be maintained only for single of few projects, a much better approach is to use devpi (http://doc.devpi.net/). It is a PyPI-compatible package index implementation that provides both:
The main advantage of devpi over bandersnatch is how it handles mirroring. It can of course do a full generic mirror of other indexes, like bandersnatch does, but it is not its default behavior. Instead of doing rather expensive backup of the whole repository, it maintains mirrors for packages that were already requested by clients. So whenever a package is requested by the installation tool (pip
, setuptools
, and easyinstall
), if it does not exist in the local mirror, the devpi server will attempt to download it from the mirrored index (usually PyPI) and serve. Once the package is downloaded, the devpi will periodically check for its updates to maintain a fresh state of its mirror.
The mirroring approach leaves a slight risk of failure when you request a new package that has not yet been mirrored and the upstream package index has an outage. Anyway, this risk is reduced thanks to the fact that in most deploys you will depend only on packages that were already mirrored in the index. The mirror state for packages that were already requested has eventual consistency with PyPI and new versions will be downloaded automatically. This seems to be a very reasonable tradeoff.
Modern web applications have a lot of dependencies and often require a lot of steps to properly install on the remote host. For instance, the typical bootstrapping process for a new version of the application on a remote host consists of the following steps:
requirements.txt
file)For more complex sites, there might be lot of additional tasks mostly related to frontend code:
All of these steps can be easily automated using tools such as Bash, Fabric, or Ansible but it is not a good idea to do everything on remote hosts where the application is being installed. Here are the reasons:
For obvious reasons, the results of the mentioned deployment steps can't be included in your application code repository. Simply, there are things that must be done with every release and you can't change that. It is obviously a place for proper automation but the clue is to do it in the right place and at the right time.
Most of the things such as static collection and code/asset preprocessing can be done locally or in a dedicated environment, so the actual code that is deployed to the remote server requires only a minimal amount of on-site processing. The most notable deployment steps either in the process of building a distribution or installing a package are:
install
command of the setup.py
scriptcompilemessages
in Django) can be a part of the sdist
/bdist
command of the setup.py
scriptInclusion of preprocessed code other than Python can be easily handled with the proper MANIFEST.in
file. Dependencies are of course best provided as an install_requires
argument of the setup()
function call from the setuptools
package.
Packaging the whole application of course will require some additional work from you like providing your own custom setuptools
commands or overriding the existing ones, but gives you a lot of advantages and makes project deployment a lot faster and reliable.
Let's use a Django-based project (in Django 1.9 version) as an example. I have chosen this framework because it seems to be the most popular Python project of this type, so there is a high chance that you already know it a bit. A typical structure of files in such a project might look like the following:
$ tree . -I __pycache__ --dirsfirst . ├── webxample │ ├── conf │ │ ├── __init__.py │ │ ├── settings.py │ │ ├── urls.py │ │ └── wsgi.py │ ├── locale │ │ ├── de │ │ │ └── LC_MESSAGES │ │ │ └── django.po │ │ ├── en │ │ │ └── LC_MESSAGES │ │ │ └── django.po │ │ └── pl │ │ └── LC_MESSAGES │ │ └── django.po │ ├── myapp │ │ ├── migrations │ │ │ └── __init__.py │ │ ├── static │ │ │ ├── js │ │ │ │ └── myapp.js │ │ │ └── sass │ │ │ └── myapp.scss │ │ ├── templates │ │ │ ├── index.html │ │ │ └── some_view.html │ │ ├── __init__.py │ │ ├── admin.py │ │ ├── apps.py │ │ ├── models.py │ │ ├── tests.py │ │ └── views.py │ ├── __init__.py │ └── manage.py ├── MANIFEST.in ├── README.md └── setup.py 15 directories, 23 files
Note that this slightly differs from the usual Django project template. By default, the package that contains the WSGI application, the settings module, and the URL configuration has the same name as the project. Because we decided to take the packaging approach, this would be named webxample
. This can cause some confusion, so it is better to rename it conf
.
Without digging into the possible implementation details, let's just make a few simple assumptions:
djangorestframework
and django-allauth
, plus one non-Django package: gunicorn
.djangorestframework
and django-allauth
are provided as INSTALLED_APPS
in the webexample.webexample.settings
module.gettext
messages in the repository.Knowing the structure of the project, we can write our setup.py
script in a way that make setuptools
handle:
webxample/myapp/static/scss
gettext
messages under webexample/locale
from .po
to .mo
formatmanage.py
scriptWe have a bit of luck here. Python binding for libsass
, a C/C++ port of SASS engine, provides a handful integration with setuptools
and distutils
. With only little configuration, it provides a custom setup.py
command for running the SASS compilation:
from setuptools import setup setup( name='webxample', setup_requires=['libsass >= 0.6.0'], sass_manifests={ 'webxample.myapp': ('static/sass', 'static/css') }, )
So instead of running the sass
command manually or executing a subprocess in the setup.py
script we can type python setup.py build_scss
and have our SCSS files compiled to CSS. This is still not enough. It makes our life a bit easier but we want the whole distribution to be fully automated so there is only one step for creating new releases. To achieve this goal, we are forced to override a bit some of the existing setuptools
distribution commands.
The example setup.py
file that handles some of the project preparation steps through packaging might look like this:
import os from setuptools import setup from setuptools import find_packages from distutils.cmd import Command from distutils.command.build import build as _build try: from django.core.management.commands.compilemessages import Command as CompileCommand except ImportError: # note: during installation django may not be available CompileCommand = None # this environment is requires os.environ.setdefault( "DJANGO_SETTINGS_MODULE", "webxample.conf.settings" ) class build_messages(Command): """ Custom command for building gettext messages in Django """ description = """compile gettext messages""" user_options = [] def initialize_options(self): pass def finalize_options(self): pass def run(self): if CompileCommand: CompileCommand().handle( verbosity=2, locales=[], exclude=[] ) else: raise RuntimeError("could not build translations") class build(_build): """ Overriden build command that adds additional build steps """ sub_commands = [ ('build_messages', None), ('build_sass', None), ] + _build.sub_commands setup( name='webxample', setup_requires=[ 'libsass >= 0.6.0', 'django >= 1.9.2', ], install_requires=[ 'django >= 1.9.2', 'gunicorn == 19.4.5', 'djangorestframework == 3.3.2', 'django-allauth == 0.24.1', ], packages=find_packages('.'), sass_manifests={ 'webxample.myapp': ('static/sass', 'static/css') }, cmdclass={ 'build_messages': build_messages, 'build': build, }, entry_points={ 'console_scripts': { 'webxample = webxample.manage:main', } } )
With such an implementation, we can build all assets and create source distribution of a package for the webxample
project using this single terminal command:
$ python setup.py build sdist
If you already have your own package index (created with devpi
) you can add the install
subcommand or use
twine
so this package will be available for installation with pip
in your organization. If we look into a structure of source distribution created with our setup.py
script, we can see that it contains the compiled gettext
messages and CSS style sheets generated from SCSS files:
$ tar -xvzf dist/webxample-0.0.0.tar.gz 2> /dev/null $ tree webxample-0.0.0/ -I __pycache__ --dirsfirst webxample-0.0.0/ ├── webxample │ ├── conf │ │ ├── __init__.py │ │ ├── settings.py │ │ ├── urls.py │ │ └── wsgi.py │ ├── locale │ │ ├── de │ │ │ └── LC_MESSAGES │ │ │ ├── django.mo │ │ │ └── django.po │ │ ├── en │ │ │ └── LC_MESSAGES │ │ │ ├── django.mo │ │ │ └── django.po │ │ └── pl │ │ └── LC_MESSAGES │ │ ├── django.mo │ │ └── django.po │ ├── myapp │ │ ├── migrations │ │ │ └── __init__.py │ │ ├── static │ │ │ ├── css │ │ │ │ └── myapp.scss.css │ │ │ └── js │ │ │ └── myapp.js │ │ ├── templates │ │ │ ├── index.html │ │ │ └── some_view.html │ │ ├── __init__.py │ │ ├── admin.py │ │ ├── apps.py │ │ ├── models.py │ │ ├── tests.py │ │ └── views.py │ ├── __init__.py │ └── manage.py ├── webxample.egg-info │ ├── PKG-INFO │ ├── SOURCES.txt │ ├── dependency_links.txt │ ├── requires.txt │ └── top_level.txt ├── MANIFEST.in ├── PKG-INFO ├── README.md ├── setup.cfg └── setup.py 16 directories, 33 files
The additional benefit of using this approach is that we were able to provide our own entry point for the project in place of Django's default manage.py
script. Now we can run any Django management command using this entry point, for instance:
$ webxample migrate $ webxample collectstatic $ webxample runserver
This required a little change in the manage.py
script for compatibility with the entry_points
argument in setup()
, so the main part of its code is wrapped with the main()
function call:
#!/usr/bin/env python3 import os import sys def main(): os.environ.setdefault( "DJANGO_SETTINGS_MODULE", "webxample.conf.settings" ) from django.core.management import execute_from_command_line execute_from_command_line(sys.argv) if __name__ == "__main__": main()
Unfortunately, a lot of frameworks (including Django) are not designed with the idea of packaging your projects that way in mind. It means that depending on the advancement of your application, converting it to a package may require a lot of changes. In Django, this often means rewriting many of the implicit imports and updating a lot of configuration variables in your settings file.
The other problem is the consistency of releases created using Python packaging. If different team members are authorized to create application distribution, it is crucial that this process takes place in the same replicable environment, especially when you do a lot of asset preprocessing; it is possible that the package created in two different environments will not look the same even if created from the same code base. This may be due to different version of tools used during the build process. The best practice is to move the distribution responsibility to a continuous integration/delivery system such as Jenkins or Buildbot. The additional advantage is that you can assert that the package passes all required tests before going to distribution. You can even make the automated deployment as a part of such continuous delivery system.
Despite this, distributing your code as Python packages using setuptools
is not simple and effortless; it will greatly simplify your deployments, so it is definitely worth trying. Note that this is also in line with the detailed recommendation of the sixth rule in the Twelve-Factor App: execute the app as one or more stateless processes (http://12factor.net/processes).