Chapter 2. Command and Control

By its very definition, distributed version control eschews centralized control. There are no fixed rules built into Git that will help you to control access to your code—Git is, after all, just a simple content tracker. This can be a real turnoff for some people who are accustomed to version control systems that double as gatekeepers and access control managers. This lack of centralized access controls doesn’t mean your project suddenly turns into anarchy.

In “Project Governance”, you will learn about:

  • Authorship, copyright, and distribution licenses

  • Leadership models, which can set the tone for how contributions are made to your project

  • Codes of Conduct, which establish firm guidelines for expected and acceptable behavior of contributors

Then, in “Access Models”, you will learn how to structure access to your project. Three models are described:

  • Dispersed contributors

  • Collocated contributor repositories

  • Shared maintenance

By the end of this chapter, you will be able to confidently establish an access model for your team that keeps contributors happy, and ensures you are still able to comply with any reporting requirements from regulatory bodies.

Project Governance

If I were the betting type, I’d wager you picked up this book with the intention of learning Git. This section talks about legal mumbo jumbo. If you are the impatient type, you may wonder exactly why I have wasted valuable time on this esoteric topic. Think of this information as a primer that outlines your rights as an author, and also your responsiblities as a steward of a project repository. The content outlined in this section will be slightly more relevant to public, open source projects. Increasingly, though, government and large enterprises are working with publicly available code, and choosing to make their own code open. (Even Microsoft has many open source libraries available today! Go, Microsoft!)

Producing Open Source Software

In this chapter I cover the highlights for running a project. Software developers and managers who are considering running their project as an open source project should also read Karl Fogel’s Producing Open Source Software. This free book covers everything from publicitly and handling growth to legal matters and political infrastructure.

In this section, you will learn about the assignment of authorship for a given piece of code. Later, when you are working with Git, you will see that Git allows you to track who injected each tiny piece of code into your repository. In addition to tracking authorship, you can even use Git to “sign off” on new code that is added to a repository.

Copyright and Contributor Agreements

Copyright is the exclusive, assignable, legal right to use and distribute a piece of work. Around the world, the details of copyright legislation vary; however, the general rule is that the person who created a work owns the right to copy and distribute the work. In open source software, the copyright holders agree to license their work to a wider community. Popular Free Libre Open Source Software (FLOSS) distribution licenses are covered in the next section.

If the author was compensated for his or her work product, the copyright will often be granted to the payer or patron. In the United States, this is referred to as a work for hire and is almost always the case in employer–employee relationships, and is typically the case for contract workers. If you’re not sure if you own the copyright to your work, check your agreement; and if there isn’t a clause, check your local jurisdiction to see if there is an established precedent. In the United States, contractors and freelancers don’t fall under the definition supplied by the Supreme Court, so it isn’t work for hire. The terms are broad, though. Ideally, update your contract so that it explicitly states who owns the copyright to your work.

Copyright only covers the specific implementation of a work. You cannot copyright an idea. You may have heard of reverse engineering, which is one way of getting around a specific author’s moral claim to a piece of work. Some jurisdictions around the world also have a restraint of trade clause. This language prohibits an employee (or contractor) from engaging in similar work elsewhere for a period of time. Effectively, this clause prevents employees from starting at a new job and reverse engineering or creating an equivalent piece of work from the one they developed for their former employer. It must be deemed by the courts as a “reasonable” restraint—limited to an industry or specifics about the job; and cannot be so broadly interpreted that the worker is essentially prevented from working at any job.

Patents, in some jurisdictions, do cover the idea behind an invention. Software patents are extremely contentious because they are perceived in many cases to stifle innovation. Patents are never automatically granted and always involve an application within a specific jurisdiction.

If you are participating on an open source project on behalf of your employer, the assignment of copyright might be a bit more complicated. This is especially true if the project has a policy to only accept work from individuals, and your place of employment retains all copyright on the work you produce; it may also be true if your place of employment has rules about what you are allowed to work on in your free time. (I can name specific examples of both open source projects and companies with these restrictions.) I am not a lawyer and cannot give you legal advice. Only you can choose if you want to ask permission or beg forgiveness. I can, however, highlight the issue of copyright and encourage you to consider what is most appropriate for everyone in the long term. It would be a shame if your work had to be removed from an open source project for any reason. Radical transparency is risky, but I think it’s worth it in the end.

To increase their future powers, some corporations have opted to put a contributor agreement on their public projects. Canonical, Chef, Puppet, Google, and .NET all have a variation on a contributor license agreement. The agreement varies per company, but the gist of most of them is “if you choose to submit a contribution, you agree to reassign your copyright to the project.” Just as there is a Creative Commons license for content, there is now a Harmony Agreements template for contribution agreements. The biggest rationale I’ve seen for a contributor agreement is that it allows the project to change the distribution license of a project without explicit consent from individual contributors. In open source software, these contributor agreements are often perceived as being against the spirit of open source. On the other hand, it can make it difficult for corporations to make legal decisions regarding that software in the future if they don’t own the copyright.

Distribution Licenses

Once you have determined copyright for your project, the next piece you need to establish is the distribution license. This will clarify how you want others to use, or not use, your project.

GitHub has put together an excellent primer for the more popular open source licenses it recommends. The primer includes the following licenses:

  • The MIT License allows people to do anything they want with your code as long as they provide attribution back to the original authors of the work, and do not hold you liable for the software. jQuery and Rails both use an MIT license.

  • The Apache License is similar to the MIT License, but it also explicitly grants patent rights from contributing authors to users, and requires a change notice that describes how the derivative work changes from the previous version. Apache, Subversion, and NuGet use an Apache license.

  • The GNU General Public License (GPL), V2 or V3, is a sharing-friendly copyleft license that requires anyone who distributes your code or a derivative work to make the source available under the same terms. V3 is similar to V2, but further restricts use in hardware that forbids software alterations. Linux, Git, and WordPress use this type of license.

  • If your content isn’t code, a Creative Commons license may be more appropriate for your work. This license allows you to grant redistribution rights, with or without modification, for commercial or noncommercial use.

You are also welcome to not choose a distribution license; however, this effectively signals to people that you are not interested in others using your work without seeking explicit permission.

When to Not Use a Distribution License

Using a distribution license on a public project is almost always a good idea. That said, I sometimes choose to omit a distribution license on my public repositories. Typically this happens if I think I may incorporate the work into a full-length book with a traditional publisher. Some publishers require you to reassign copyright to them and will protect the work on your behalf. (O’Reilly leaves all copyright with the original author.) If I have accepted contributions from others under an open license, it may impact my ability to reassign copyright later.

If you encounter a public project that does not have an explicit license, and you want to incorporate the work into your own, get in touch with the project maintainers first and ask them to add a license to their work.

Leadership Models

Open source software allows people to collaborate on building systems that are more powerful, more secure, more feature-rich, and more sustainable when the burden of maintenance is shared among many. If you are a project of one, it might not make sense to create a governance document, but if you are anticipating others contributing as well, you should consider outlining how you want the project to be run.

A few of the governance models I participated in include:

Benevolent Dictator for Life (BDFL)

In this model, the leader of the project has final say over every decision about every aspect of the code base. The BDFL may not actively participate in every code review, but ultimately retains the control to reject or reverse any decision made. The community exists at the whim of the dictator. Sounds horrible, right? Well, it can be if the dictator isn’t benevolent. This model has been successfully used by the Ubuntu project, and others.

Consensus-driven, leader-approved

The Drupal community works on a consensus model where the community most active on a given part of the system is encouraged to find solutions that are appropriate. When the community is happy with the solution, they mark an issue as Reviewed and Tested by the Community (RTBC, which is a backronym for Ready to be Committed). Drupal has additional working groups for content, licensing, and security issues.

Technical review board or Project Management Committee

A fork of the Drupal project, Backdrop, distinguished itself early in the project by adopting an explicit governance model, which is based on the Apache project Project Management Committee (PMC) model.

If you would like more guidance on setting up a governance plan for your project, I recommend resources by Lisa Welchman, including her book Managing Chaos (Rosenfeld Media).

Code of Conduct

Some communities have made the difficult decision to reject code from community members who refused to behave in a friendly manner toward others in the community. Other communities, however, are notorious for their unfriendly, intolerant behavior. You may be able to think of several communities you enjoy participating in, and want to emulate in your own project.

Community culture is the consistent reinforcement of behavioral standards. Although you may wish to simply cross your fingers and hope that people are excellent to each other, there may come a day when you wish you had a rule book you could point to. A community code of conduct allows you to explicitly detail what is expected of those who participate in your project. There are several established codes of conduct that have been community vetted. You may wish to begin with one of these as your starting point.

Flickr is the first community code of conduct that I was aware of using, and which made a point to ensure its members knew there were guidelines in place. I’m sure it has changed since I first read the document; you can read the current version at Flickr Community Guidelines.

The Drupal Code of Conduct is the one I’m most familiar with. It was derived from an early version of the Ubuntu Code of Conduct (a newer version is now available), and has even been used as inspiration for the Humanitarian ID Code of Conduct, a project by the United Nations Office for the Coordination of Humanitarian Affairs.

It is appropriate to add your Code of Conduct (CoC) document to the project’s supporting website. If you do not have a separate website for your project, you could add your CoC as a wiki page within GitHub. Links to wiki pages are available in the righthand sidebar from the home page for the project.

Access Models

If you have been using version control for a long time, you may remember systems like CVS or Subversion with a centralized repository. Figure 2-1 demonstrates how changes were made in Subversion’s centralized system. In this system, each time you wanted to save a snapshot of your work to the repository, you were potentially saving to the same place as someone else. Just when you thought you were ready to share your work, or request a code review, you would sometimes be prevented from doing so if someone else had recently updated the same branch with their own work.

Committing files to the repository in subversion means saving your files to a remote repository. There are no local commits with a centralized version control system.
Figure 2-1. Working with files in Subversion

Git, on the other hand, is a distributed version control system. This means instead of having one central place that everyone must use if they want to have their changes recorded, each person works independently from the centralized code hosting system, and is responsible for making commits to his or her local copy of the repository. This means changes from other developers are never forced into your work; instead, it is your decision of when to incorporate outside work, and when to share your own.

Establishing Connections to Others

Although people love to talk about coding from airplanes that don’t have an Internet connection when working with Git, I think the real advantage is that you can do more of your thinking in private. You can make new branches, think about new ideas in code and—only when you’re ready—establish a connection with others.

If you subscribe to Myers-Briggs, Git might be INTP, and Subversion might perhaps be ESFJ.

Every time you sit down to work with Git, you are sort of working in a centralized fashion as far as your computer is concerned; your repository of changes is entirely self-contained on your local machine, as shown in Figure 2-2. You do some work, and then save that work to your local repository. Then, when you’re ready to share your work with others, you make a connection to a remote repository and push your copy of a specific branch to it.

Committing files to the repository in Git means saving your files to a local repository. If you want to share your work, you need to issue a separate command.
Figure 2-2. Working with files in Git

Keeping your work entirely local would be very limiting! Instead, we make connections to other systems, and share our code through the remote repositories.

Git does not have the ability to control access—instead, it allows any developer full read/write access to the repository. At the most coarse level, you limit this control through login controls. I develop on my machine, to which you don’t have access, and therefore you cannot change my repository. As soon as we put the repository in a shared location, such as a centralized code hosting server, we need to agree on how we will govern our access to the repository.

Some Git hosting systems, such as Bitbucket, allow fine-grained, per-branch access controls; however, most force you to limit control on a per-repository basis. In other words, you either are a committer for any branch on the repository, or you are limited to making your contributions through pull requests.

In this section, we cover the three most popular models:

  • Single Repository; Shared Maintenance, wherein everyone on the team is considered a maintainer and is granted access to upload changes to the project repository.

  • Collocated Contributor Repositories, wherein contributing developers create a remote copy of a project, and have their changes accepted by project maintainers.

  • Dispersed Contributor Repositories, wherein code is shared via a text-based patch file.

At the end of the section, you will learn how to chain these methods together to create a custom access model that is perfect for your team.

Dispersed Contributor Model

When Git was originally conceived, conversations about changes to the code base of an open source software project commonly happened on public mailing lists, not on centralized web hubs. This model is still used today by the Git development team. It is almost certainly not appropriate for your team to use this model for its development; however, understanding the model has implications for some of the more advanced concepts required to use the commands rebase (Chapter 6) and bisect (Chapter 9).

To share their work with the community, developers would create a patch file using the program diff. They would then write an email to the discussion group, and attach their patch file as shown in Figure 2-3. To investigate the proposed changes, members of the mailing list would download the attached patch file, and apply it to their local code base, using their system’s patch command.

By sharing the patch files via a mailing list, developers were able to encapsulate and share their work—while efficiently limiting what was shared to that patch file so that the people evaluating the work could easily see what had changed between two specific points in time within a shared code base.

Form Follows Function

To make the process of working with emailed patch files easier, Git added the ability to deal with patches that were sent via a mailing list through the command am.

This model is still used by the Git project today—it is still using a mailing list to share patches, and have conversations about what features should be added to Git and what bugs should be removed.

Although the model might seem archaic, it does have some advantages:

  • You don’t need to use a specific version control system locally because the patch file doesn’t require specific version control software to be installed locally.

  • Developers can easily review the proposed changes from the comfort of their email application.

  • This model encourages whole idea thinking. If you have to email a group of people each time you make a change, you are more likely to ensure everything is right so you can avoid the embarrassment of “just one more thing.”

  • Uploading your proposed changes to a system that is not the code hosting system enforces a review procedure among the participants in the software project. In other words, as a developer, I can’t just upload my changes to the main repository; I have to announce my completed work and wait for someone else to merge it in.

Developer clones the project repository locally; makes changes and submits a patch for review; once reviewed and accepted by the community, the committer applies the patch to the project repository.
Figure 2-3. The community review process for patches

Having dispersed repositories isn’t specific to projects that communicate via mailing lists. At the time of this writing, the Drupal project was using a variant of this model. Instead of using a mailing list to share patches, though, it was using a self-hosted, centralized code hosting and ticket issue system. Figure 2-4 shows a screenshot of an issue with an attached patch file.

One of the comments in the issue queue includes an attached patch which has been reviewed by the automated tests
Figure 2-4. A Drupal issue queue with attached patch file

In this model, you can sign the individual commits before sharing them; however, this makes it more difficult to unpack the history of who made which changes if multiple people were involved. The team will need to, instead, adhere to a patch formatting policy (signed or not), and a commit message style. Drupal has strict formatting guidelines for its commit messages to ensure everyone receives credit for their work.

For most projects starting today, this model is not appropriate. It does, however, help to understand some of the more advanced commands, such as bisect, if you are able to think about commits as whole ideas. A more modern approach to this model is to use fork, or clone, repositories on a single code hosting system.

Collocated Contributor Repositories Model

These days, software developers are unlikely to trade patch files—instead, they are much more likely to use a central code hosting system that manages the patch process for them. Using a single code hosting system makes it easier to programmatically create and submit patches between repositories. The method for how these patches are managed is the secret sauce that makes up any code hosting system. The restrictions are presumably handled via Git’s pre-commit hooks to ensure access control is respected.

On a collocated system, the “upstream” project retains complete control over who is allowed to write to the primary project repository. Individual contributors make a clone, or fork, of the project to their own repository on the code hosting system. The contributors make changes to the copy, and then submit their requested changes in the form of a merge request or pull request, as shown in Figure 2-5. If you are working on an open source project with a lot of contributors, you are most likely using this model.

Three repositories, which were created as clones, have become chained together.
Figure 2-5. Creating a chain of cloned repositories

GitHub has popularized this model for development for contemporary open source projects. I’ve also seen this model used for internal projects with strict walls between departments. For example, if the quality assurance team is solely responsible for the final merging of code into the stable release branch, the team is likely using some variation on this model. Another reason for this separation would be if you were using extra contractors and you wanted to limit their ability to accidentally add something to the repository that hadn’t first undergone a review of some kind.

Git Versus GitHub Terms

It can be difficult to know which terms to use because the GitHub terms, which have become commonplace, don’t always match their corresponding Git commands. For example, the GitHub term fork uses the Git command clone to create a copy of a repository. Because the focus of this book is on the Git software, and not just the implementation on GitHub, the Git commands will be used. Occasionally both terms will be used because the GitHub terminology is sometimes more familiar than the individual commands.

When GitHub creates a fork of a repository, it is the same as using the Git command clone to make a copy of a repository. Once you have created a fork, you can use the GitHub web interface to make your changes directly to your repository, but this isn’t great for more than a very minor typo fix. Instead, you will likely create a second clone of the repository—this time from the forked repository to your local workstation. This effectively creates a chain of clones from one copy to another. Keeping all of the repositories in sync takes a little bit of work; however, it’s a lot fewer commands to memorize than working directly with patches. You win some, you lose some.

Working with repositories that share the same infrastructure should be easier than the dispersed repositories because it allows you to more easily use wrapper software. In addition to it being a little easier to keep the work updated, the wrapper software can also give you more control over who is able to commit work and receive credit for their work.

Typically, the first repository in the chain can only be altered by a handful of core committers who can add new commits to the repository, or merge branches. Most of the people working on the project will, instead, be working from a local clone of the repository. In this local, cloned repository, each person will have infinite control over what happens. They can add new branches, add new code, and share their proposed changes with others by pushing their work to their public clone of the main repository. Once the work has been pushed to the public clone, coders can solicit feedback on their work to date. Once the work has been fully reviewed and tested by the community, the coders can make a merge request or pull request from their public clone to the main repository.

If someone doesn’t have the intention of contributing their work back to the main project, they can skip creating a public clone, and instead create a clone from the main project directly to their local environment. Things can get a little tangled if you realize you do have changes you want to submit back to the project, and you’ve also done your own work, which shouldn’t be shared.

It isn’t always easy to know that you’re going to do something that might be useful to others, though. For example, I was working on my slide deck for OSCON with an open source presentation framework, reveal.js. Your equivalent example might be with a WordPress theme, or a frontend framework, or some other project that gives you a basic starter kit as part of the initial package.

Previously while working on my slides with reveal.js, I decided I probably wouldn’t need to upgrade the reveal.js software I was running and stopped worrying about keeping a Git connection to the upstream project. I shuffled all of the folders around in my repository to make it work for what I was doing. A custom theme was created. Tweaks were made. It had truly become a forked project, disconnected from where it began. (Developers with even a little bit of open source experience will be groaning at this point because they’re already jumping ahead to the inevitable realization that I’m about to reveal.) But as I started working on things, I realized I couldn’t get the slides to format properly for the handout. I wanted my speaker notes to appear alongside the slide, instead of having them tucked below it. I opened a bug report for the project on GitHub, and continued working. A few people gave me suggestions on how I might want to reformat things. Aha! I had some ideas on how to solve the problem. I considered my own issue closed, but there were others who were also interested in my solution. Now I was truly stuck. I had created my project without the intention of sharing my work.

If you are submitting a patch, you might have been able to cheat and share only snippet of your work, but when you are working with collocated contributors, you need a chain of repositories in place to be able to share your work back. My own project didn’t have a branch for the upstream work because I never had the intention of sharing my work back to the presentation framework. So I started by creating a new chain of repositories. Figure 2-6 shows the sequence of what I did next. On GitHub, I created a fork of the main reveal.js project. Then I made a local clone of this forked repository. To my local clone I created a new branch for my changes. Then I copied the changes from my OSCON slide deck (there were only a few, so I didn’t bother creating a patch, I just used my trusty copy-and-paste tools) into my cloned repository of the presentation framework. With the changes in place, I pushed my changes back to my remote repository on GitHub, and created a pull request to ask to have my changes incorporated back into the project.

The public clone of the reveal.js repository was required because I do not have write permission for the reveal.js repository. If I did have write access, I could have skipped making the public clone and just created a local clone.

Shared Maintenance Model

Finally, we have arrived at what is likely the most typical permission model for internal teams (and teams of one): shared maintenance. In this model, there is an inherent trust among team members. It is assumed that code will be checked and verified before it is committed to the main project branch, and that, generally, the developers are trusted. In this model, work is done locally by all developers before it is pushed into the shared repository for the project. When working with an internal team, as shown in Figure 2-7, this is often where we start: with a single shared repository that everyone has shared write access into.

Three chained repositories, with the work happening on the third, local, repository; and the pull request being initiated from the second, remote, repository.
Figure 2-6. Suggesting changes to a project from collocated repositories

Git does not accommodate permissions and instead relies on other systems to grant or deny write access to a repository. If you do need to prevent people from uploading their code to a shared repository, you need to use the host system’s access control to do so. If you are not using a Git hosting platform, this access control might be controlled via SSH accounts.

In addition, Git further does not allow you to be locked out of only some branches, as you might find in Subversion. Without additional software in place, it is by convention that teams agree not to commit changes to specific branches without the prerequisite testing. Per-branch access restrictions are available through Bitbucket (Chapter 11) and GitLab (Chapter 12). If you prefer a more lightweight system, take a look at Gitolite.

One central repository with three developers all pushing their branches.
Figure 2-7. Everyone on the team has write access to the central repository from their local repository

Custom Access Models

In addition to these individual strategies, teams may also choose to use multiple access models for a given project. This would be particularly useful for projects with very strict regulations on who could commit code to the canonical repository. Indeed, most open source projects will have different levels of access for different contributors.

A common workflow is as follows:

  • An official project repository, to which only a very few people are able to commit code. In an open source project, it would be the project maintainers; and in a closed source, or corporate, project, it could be the quality assurance team.

  • A less restricted, internal copy of the repository, which is used for integration by each of the contributors and project teams. This repository might follow a shared maintenance model, where everyone is allowed to merge their branches into the repository as part of a code review process, or even on an ad hoc basis.

  • Individually created personal repositories, locked to the individual contributors. These are typically hosted on the same code hosting system as the official repository, because most modern code hosting systems have easy-to-integrate functionality (usually called a “pull request” or “merge request”).

This split would commonly be seen in teams that employ junior developers, quality assurance teams, or perhaps external contractors.

Chapter 4 covers common workflows in more depth.

Summary

In this chapter, you learned about different ways to grant and restrict access to your project repository:

  • Clearly defining a project governance model will help ensure ownership is understood by all contributors.

  • Copyright of code is typically assigned to the author, unless the right has been reassigned to another legal entity as a work for hire or through a contributor agreement.

  • The rules restricting distribution, and derivative works of a code base are defined by its software license.

  • Git is just a simple content tracker; it does not include access control mechanisms out of the box. Some code hosting systems have incorporated pre-commit hooks that can be used to limit access per-branch.

  • Access can be limited or open for any given repository. Changes submitted to a repository are made via a patch. On code hosting systems, a programmed graphical interface is used to manage the patch submission process.

With your permission structure in place for your repository, we will next look at how you can divide your repository so that both work in progress and finished work can be shared among team members.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset