Building RPM-based repositories in Pulp

Although installing Pulp is quite a complex process, once it is installed, the process of managing repositories is incredibly straightforward. However, it does require a little knowledge of the repository structure for your chosen Linux distribution. Let's continue with the CentOS 7 build that we have been using as an example throughout this book. 

The core CentOS 7 repositories are split into two—first of all, there is the OS repository; this contains all of the files for the latest point release of CentOS 7—which, at the time of writing, is 7.6. This was last updated in November 2018 and will remain static until CentOS 7.7 is released. The updates for this release are then contained in a separate repository, and so to build a fully functional mirror for CentOS 7 in our Pulp server, we need to mirror both of these paths.

Let's start by creating a mirror of the base operating system:

  1. The first step is to log into the pulp-admin client, as we demonstrated at the end of the previous section. Then, from there, we run the following command to create a new repository:
$ pulp-admin rpm repo create --repo-id='centos76-os' --relative-url='centos76-os' --feed=http://mirror.centos.org/centos/7/os/x86_64/

Let's break that command down:

    • rpm repo create: This set of keywords tells the Pulp server to create a new RPM-based repository definition. Note that nothing is synchronized or published at this stage—this is simply creating metadata for a new repository.
    • --repo-id='centos76-os': This tells Pulp that the ID of our new repository is centos76-osthis is like a unique key and should be used to differentiate your new repository from others.
    • --relative-url='centos76-os': This instructs Pulp where to publish the repository—RPM-based repositories are published at http(s)://pulp-server-address/pulp/repos/<relative-url>.
    • --feed=http://mirror.centos.org/centos/7/os/x86_64/:   This is the upstream location from which RPM-based content will be synchronized.
  1. With our repository definition created, the next step is to synchronize the packages from the upstream server. This is as simple as running this command:
$ pulp-admin rpm repo sync run --repo-id='centos76-os'
  1. This kicks off an asynchronous command that runs in the background on the server—you can check the status at any time using this command:
$ pulp-admin rpm repo sync status --repo-id='centos76-os'
  1. Finally, once the synchronization is completed, the repository must be published—this effectively makes the synchronized content available over the Apache web server installed as part of the Pulp installation earlier:
$ pulp-admin rpm repo publish run --repo-id='centos76-os'

Now, with this completed, you have an internal snapshot of the upstream CentOS 7.6 OS repository defined by the --feed parameter, which will remain constant on our Pulp server even when CentOS 7.7 is released.

Now, of course, we also need updates to ensure we get the latest security patches, bug fixes, and so on. The frequency of updates of your repositories will depend upon your patching cycle, internal security policies, and so on. Hence, we will define a second repository to house the update packages.

We will issue an almost identical set of commands to the preceding ones to create the updates repository, only this time there are two key differences:

  • We are using the /updates/ path for the feed rather than /os/.
  • We have put a date stamp into repo-id and relative-urlyou could, of course, adopt your own versioning scheme here—however, as this repository will be a snapshot of all CentOS 7 updates to August 7, 2019, using the date of the snapshot as an identifier is one sensible approach:
$ pulp-admin rpm repo create --repo-id='centos7-07aug19' --relative-url='centos7-07aug19' --feed=http://mirror.centos.org/centos/7/updates/x86_64/
$ pulp-admin rpm repo sync run --repo-id='centos7-07aug19'
$ pulp-admin rpm repo publish run --repo-id centos7-07aug19

With this run, we can then use the pulp-admin client to inspect the repositories and inspect the disk usage. At present, we can see that the Pulp filesystem has 33 GB used, though not all of this is for CentOS as there are other repositories on this test system. This level of usage will become important in a minute.

In an enterprise environment, a good practice would be to build or update a set of test CentOS 7 systems to this August 7 snapshot and perform the requisite testing on them to ensure confidence in the build. This is especially important in physical systems where kernel changes could cause issues. Once confidence has been established in this build, it becomes the baseline for all CentOS 7 systems. The great thing about this for an enterprise scenario is that all systems (provided they use the Pulp repository) will have the same versions of all packages. This, combined with good automation practices, as we have discussed throughout this book so far, brings almost Docker-like stability and platform confidence to a Linux environment.

Building on this scenario, suppose that overnight a critical security patch is released for CentOS 7. As important it is to apply this patch in a timely manner, it also is important to perform testing on it to ensure it doesn't break any existing services. As a result, we do not wish to update our centos7-07aug19 repository mirror, as this is a known stable snapshot (in other words, we have tested it and are happy with it—it is stable within our enterprise environment).

If we were just using the upstream internet-facing repositories, then we would have no control over this and our CentOS 7 servers would blindly pick up the patch the next time an update was run. Equally, if we were manually building repository mirrors using a tool such as reposync, we would have one of two choices. First, we could update our existing mirror, which would cost us little disk space, but would bring the same problems as using the upstream repositories (that is, all servers pick up the new patch as soon as an update is run). Alternatively, we could create a second snapshot for testing purposes. I estimated that mirroring the CentOS 7 updates on the Pulp server required approximately 16 GB of disk space and so creating a second snapshot would require around 32 GB of disk space. As time goes on, more snapshots would require more and more disk space, which is incredibly inefficient.

This is where Pulp really shines—not only can it create and manage RPM-based repositories in an efficient manner, but it also knows not to download packages that it already has on a sync operation and not to duplicate packages on a publish—hence, it is very efficient in terms of both bandwidth and disk usage. Due to this, we can issue the following command set to create a new snapshot of the CentOS 7 updates on August 8:

$ pulp-admin rpm repo create --repo-id='centos7-08aug19' --relative-url='centos7-08aug19' --feed=http://mirror.centos.org/centos/7/updates/x86_64/
$ pulp-admin rpm repo sync run --repo-id='centos7-08aug19'
$ pulp-admin rpm repo publish run --repo-id centos7-08aug19

You will recognize the similarity with the commands we ran earlier in this section to create the August 7, 2019 snapshot—they are, in fact, identical except for the new repository ID (--repo-id) and URL (--relative-url), which carry the new date in to differentiate it from our earlier one. This process will run as before, as shown in the following screenshot—it appears that all packages are downloaded and at this stage, there is little clue as to what goes on behind the scenes:

However, let's now examine the disk usage:

Here, we can see that the disk usage has been rounded up to 34 GB—we would likely find the usage considerably less if we used a more fine-grained measure. In this way, Pulp allows us to create snapshots almost as we require them, without consuming vast amounts of disk space, while retaining older ones for stability purposes until new ones are proved, at which point redundant snapshots can be deleted.

It is worth saying in this regard that deleting a repository from Pulp does not necessarily free up disk space. The reason for this is that the package de-duplication at the backend must be careful not to delete any packages that are still required. In our example, more than 99% of the packages from our August 7 snapshot are also in the August 8 one, and so it is important that if we delete either of these, that the other remains intact.

In Pulp, this process is called orphan recovery, and it is the very process of finding packages that no longer belong to any repository (presumably because the repository was deleted) and tidying them up.

Completing our current example, suppose that we tested our August 8 snapshot and the updated packages in it caused problems in testing. From this, we have determined that this snapshot is not suitable for production and that we will delete it, pending creation of a new snapshot when a fix becomes available:

  1. First of all, we must delete the repository itself:
$ pulp-admin rpm repo delete --repo-id='centos7-08aug19'

This removes the repository definition and the published URL on the Apache server so that it can no longer be used.

  1. To clean up any orphan packages, we can then issue the following command:
$ pulp-admin orphan remove --all

This command is a general cleanup that removes all orphans from across the entire Pulp server and is a good general maintenance step. However, the command can receive more fine-grained control to remove only a specific type of orphan (for example, you could clean out all orphan RPMs, but not DEB packages):

  1. Once this step is completed, we will see that our additional disk space used by the new snapshot has been recovered:

In this section, so far we have stepped through all the Pulp commands and activities manually—this has been done to provide you with a good understanding of the steps required in setting up Pulp and the accompanying repositories. In regular services, best practice would dictate that these steps are performed with Ansible—however, there are no native Ansible modules to cover all of the tasks we have performed in this chapter.

For example, the pulp_repo module (introduced to Ansible in version 2.3) is capable of creating and deleting repositories, as we have done so far in this chapter with pulp-admin rpm repo create. However, it cannot perform orphan clean-up, and so this command would need to be issued using the shell or command Ansible modules. Full automation with Ansible is left as an exercise for you.

Once our repos are set up, the final step is to put them into use on our Enterprise Linux servers, and we will cover this in the next section of this chapter.

First, though, we will look at some of the nuances of managing DEB packages in Pulp in contrast to RPM-based management.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset