Chapter 3. Branching Strategies

In version control, a branch is a way to separate parallel thinking about how a piece of code might evolve. A branch always begins from a specific point in the code base. In Chapter 2 we talked about forking and cloning a repository. A branch is like an in-repository split where new work begins. A branch might be created with the intention of contributing work back, or it might be created with the intention of keeping work separate. Branches don’t care what changes they’re tracking! They just are.

The branching strategy that you use depends on your release management process. Branches allow you to change the files that are visible in the working directory for your project, and only one branch can be active at a time. Most branching strategies separate the work in your project by coarse ideas. An idea could be the version of your software—for example, version 1, version 2, version 3. And spawning from those software versions you might have ideas that are in progress. These ideas are generally separated into branches according to the name of the feature they represent. They might be a bug fix or a new feature, but they also represent whole ideas on a smaller scale.

This chapter outlines:

  • How to choose a branching convention for your team

  • Mainline development

  • Branch-per-feature deployment

  • State branching

  • Scheduled deployment

There are no limits to the ways you can use branches. This can be a good thing and a bad thing. A few artificial constraints (conventions) will help you consider the possibilities for your team.

Understanding Branches

Without getting into the internals of how Git works, having a basic understanding of what a branch is will help you to choose and apply the strategies outlined in this chapter.

Each Git repository contains a pool of commits. These commits are linked to one-another through their metadata—each commit contains a reference to its parent. In the case of a merge commit, there may be more than one parent commit referenced. I like to think of a branch as a string of beads, with each commit represented as a bead on the string. The analogy isn’t technically correct, but it works quite well as a mental model for our purposes. Branches in Git are actually a named pointer to a specific commit. (Give yourself a magic wand, and tap on a specific bead while saying a name. You have just created a named branch.) When you check out a branch you are copying the data stored in the commit object (identified by the pointer) to your working directory. Once the work has been copied into the working directory, you can make as many changes as you like (add, edit, delete files), and save the changes as a new commit object to your local repository. The named pointer will be automatically updated to point to the new commit object you have just created and your branch will be updated.

Any commit objects you create are local and exclusively yours until you choose to explicitly share them with a remote repository. This is radically different than the centralized model of version control where committing a change automatically uploads the work. For some foreshadowing of conflicts to come, just remember that each developer has a magic wand for his or her own repository.

To avoid conflict, developers have created conventions for the naming and use of branches. These conventions help developers to choose when to allow work to diverge (create new branch), and when to merge (combine commit objects from two or more branches). Generally there are two types of branches used in a convention: a long-running public branch; and a short-lived private branch. The function of a long-running branch is to act as a mediator for code which is contributed by lots of developers. The function of a short-lived branch is to sandbox the development of a new idea. These new ideas could be bug fixes, feature additions, or experimental refactoring. It’s up to you!

When you share a branch with others, you may continue adding commit objects to your copy of the branch; however, now that the branch has been shared, someone else could also be adding commit objects to their copy of the branch. The next time you try synchronize the two copies of the branch Git, as a simple content tracker, will defer to your expertise in combining the two sets of commit objects into a single shared history. This pause in the automated process is refered to as a merge conflict which sounds scary, I’ll admit. Your job is to engage in conflict resolution and choose the best shared history for the work in question.

You will learn about strategies to keep your branches up to date in “Updating Branches”, and practical commands in Chapter 7. Conflict resolution is also covered in Chapter 7. First, though, let’s take a look at some of the most common branch naming strategies developers use for maintaining their work in Git.

Choosing a Convention

A convention is an agreed-upon standard for how things are usually done. As developers, conventions allow us to quickly pick up the patterns of how a software project runs and integrate our work without disrupting the flow for others on the team. A documented convention makes onboarding easier for both the newcomer and others on the team who now need to take less time away from their work to help the new person.

Choosing an appropriate branching strategy for your team requires a conversation with your teammates about how you want to release your work. (From now on, I’ll use “software” to mean your project, even though Git can be used for other things as well, such as writing books!) You might want to use a daily release schedule for a website, but a monthly, quarterly, or biannual release schedule for a downloadable software product. You may even have to comply with auditing or compliance regulations that have their own requirements. Once you know how you will release your software, and whether you have auditing or tracking requirements, you can choose the best branching strategy for your needs.

If you already know how you’ll be working, take a few minutes to sketch out your requirements before diving into the details and choosing the branching strategy that best matches your needs. If you’re not really sure what your system will look like, Chapter 4 will give you ideas about how you might want to structure your team interactions.

As long as your team documents what they’re doing, there are no hard rules. Indeed, if you look at the repositories for several open source projects, you’ll see that there’s no standard way of doing things. I recommend using the GitHub mirrors to easily compare the branching strategies used by Drupal, Git, and Sass. These three very popular projects all use very different branching strategies.

There are no version control police who will show up at your door and tell you if you’re doing things wrong, and you’re almost guaranteed to find at least one other team who’s making software in a similar fashion to you. But if you are new to working with version control, or your team has been struggling to figure out how to make things a little smoother, using one of the conventions described in this chapter might help.

Conventions

When working with software projects, there are generally two different approaches teams can take: they can either use an “always be integrating” approach, or they can collate the work that’s being done and release a collection of work all at once. In between these two opposites there are many different variations on how work can be done.

This section outlines several of the most common strategies used by development teams today. You may choose to adopt one of these strategies wholesale, or adapt it for your needs. No matter what you choose, remember to document your decisions.

Mainline Branch Development

The easiest branching strategy to understand is the mainline branch method. In this strategy, there are fewer branches to work with. The developers are constantly committing their work into a single, central branch—which is always in a deployment-ready state. In other words, the main branch for the project should only contain tested work, and should never be broken.

As a team of one, I often work on tiny side projects that only just barely warrant having version control, such as writing an article for a magazine. In these cases, I commit all of my work in the default branch (named master by Git) as is shown in Figure 3-1. If I have two unrelated ideas that I am working on, I might be lazy and choose to commit everything, or I might stash some of the work to save it for later. For these simple projects, it doesn’t warrant separating thinking into different branches in order to work efficiently.

Reading Ball-and-Chain Diagrams

Each circle on the diagram represents a commit of work stored in the Git repository that can be reversed. The proper name for these “ball-and-chain” commit diagrams is a directed acyclic graph (DAG). There’s no quiz where you need to remember this. Promise. But it is a useful term if you’re looking for keywords for future research.

Several commits on a time line, each representing point in time when I committed changes to the repository I was saving my work into.
Figure 3-1. Mainline branch development: storing all commits to a single branch

As the project matures, there will be more and more to think about, and it will get harder to keep track of ideas. I’ll start adding new branches as I think about new directions I might want to take my project in, but that aren’t as fully thought out as some of the other pieces I’m working on. Perhaps I’ll even expand my team and have a reviewer or two with their own, independent branches, as shown in Figure 3-2. As the project grows in complexity (and team members), so will the number of branches. But they won’t all be active all the time. Like in the story of Goldilocks and the Three Bears, your team will likely settle on a number of branch types that feel “just right.” Each unit of work (or sprint) may have an accordion effect on the number of branches. At first, the developers are all working on their own pieces, and the number of branches expands. Then, as each of the developers finishes his or her work and integrates it with the others’, the accordion compresses back down again.

At scale, this approach of having a single working branch is used by teams working with automated build procedures.

Terms for Teams Who Are Always Deploying

Continuous integration is the practice of having all developers incorporate their work into the mainline of the project several times a day. Continuous delivery is the practice of automating the steps from a developer’s local workstation up to the server (but not deploying through an automated process). And finally, continuous deployment is the most complete definition of automation, with all code passing through a series of test gates directly to the production server.

Several commits on a time line, with additional branches arching away from the main branch, representing tech editors.
Figure 3-2. Mainline development with branching: branches separate the work being contributed by multiple people

Perhaps it makes sense for your team to integrate their work into a central branch regularly, but only deploy work occasionally. As soon as you start collecting your work, you need to make a distinction between what you have locally, and what is being used on your production server. If all code is ready for deployment, it shouldn’t be too big of a deal to add a little fix and roll everything out. But what if you have changes committed in your repository that are only mostly finished? This is where we start to move away from a purely continuous deployment strategy, and toward multiple branches in a scheduled deployment strategy.

There are several advantages to using a branching strategy that encourages regular integration of your work:

  • There aren’t very many branches across the entire project. This results in less confusion about where a change disappeared into.

  • Commits that are being made into the code base are relatively small. If there is a problem, it should be relatively quick to undo the mistake.

  • There are fewer emergency fixes, because any code that is saved into the main branch is ready to be deployed. Deployments can often be stressful for developers as they hold their breath while code goes live in production and wait to hear back from the code’s users. With tiny frequent updates, this procedure becomes practiced, and finally automated to the point where it should be almost invisible to the end user.

There are disadvantages to using this strategy as well:

  • The assumption is that the main branch contains deployment-ready code. If your team doesn’t have a testing infrastructure, it can be risky to assume that new code won’t break anything, especially as the project becomes more complex over time.

  • The notion of a deployment is more appropriate for code that is automatically loaded onto a user’s device (for example, a website). It is less appropriate for software that must be downloaded and installed. While updates that fix problems are welcomed, even I would get annoyed if I had to download and reinstall an application on my phone on a daily basis.

  • One of the ways developers can verify code on production is to hide the feature behind a flag or a flipper. Facebook, Flickr, and Etsy are all rumored to use this technique. The potential risk here is that code can be abandoned behind the flags, resulting in a large technical debt for code that isn’t removed because it is hidden.

Unfortunately, it is out of the book’s scope to describe how to set up the infrastructure for continuous deployment because it will be somewhat dependent on the language you are writing in (each language has its own testing libraries) and your deployment tools. If you would like to read more about the philosophy, the book Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation by Jez Humble and David Farley (Addison-Wesley Professional) is a good starting place.

Branch-Per-Feature Deployment

To overcome some of the limitations of the single branch strategy just described, you can introduce two additional types of branches: feature branches and integration branches. Technically, they aren’t different kinds of branches; it’s just the convention of what type of work is committed to the branch that differs.

In the branch-per-feature deployment strategy, all new work is done in a feature branch, which is as small as it can be to contain a whole idea. These branches are kept up to date with the work being done by other developers via an integration branch. When it is time to release software, the build master can selectively choose which features to include in the build and create a new integration branch for deployment. As Figure 3-3 shows, a build does not necessarily include all of the work completed since the last build.

Several commits on a time line, with additional branches arching away from the main branch. The integration branch is updated regularly, but the deployment branch contains only a few of the available feature branches.
Figure 3-3. Branch-per-feature: feature branches are kept up to date via an integration branch

By adding feature branches and an integration branch, you can continue to have deployment-ready code, but also a pause before deploying the code. The most popular description of this model is by Adam Dymitruk. A slightly earlier description of this model was by Scott Chacon and is named the GitHub Flow. With a few minor updates, this process is still used by GitHub today.

In the GitHub Flow branching model, anything in the master branch is deployable. When working on new code, GitHub Flow has the developers create a descriptively named feature branch and commit their work regularly to this branch. This branch is kept up to date with master and is regularly pushed to a branch on the shared repository, allowing others to see which features are actively being worked on. When developers think their work is complete, or when they need help with their work, they will issue a pull request to the master branch. In the ticketing system, there will then be a conversation about the work that is being proposed.

Up to this point, the GitHub Flow is virtually the same as the Dymitruk model. Where they differ is in how the deployment happens. In the Dymitruk model, a build is made by selecting which features are ready to be incorporated. In the GitHub Flow model, once a pull request is accepted, the work is immediately ready to be deployed from its feature branch. This makes the strategy closer to mainline development. Originally, GitHub merged its feature branches into the master branch and then deployed the master branch. Nowadays, the feature branch is deployed and if there are no errors, it is merged into master as shown in Figure 3-4. This means that if there are problems with a feature branch, master can immediately be redeployed because it is proven to be in a working state.

Several commits on a time line, with additional branches arching away from the main branch. The integration branch is updated regularly, but the deployment branch contains only a few of the available feature branches.
Figure 3-4. GitHub Flow: feature branches are deployed after a review and then merged into master

There are several advantages to using a branch-per-feature deployment strategy:

  • Much like mainline development, the focus is on rapid deployment of code.

  • Unlike the mainline development, there is an optional build step. When the build step is used, there is the option to select which features should be incorporated into the master branch for deployment.

There are disadvantages to using a branch-per-feature deployment branching strategy as well:

  • If code is kept on a feature branch, but it is not immediately rolled into master, there is an extra maintenance requirement for developers who need to keep their features up to date while waiting to be rolled into the deployed branch.

  • The semantic naming of the branches helps those who are familiar with the system, but it also represents an insider language that can make onboarding more difficult if there are a lot of open features.

  • There is now a housekeeping requirement for developers to remove old branches as they are rolled into master. This isn’t a large burden, but it is more than would be required from working out of a single master branch.

The branch-per-feature strategy offers a nice middle ground between mainline development and scheduled deployment. In some ways, scheduled deployment extends the branch-per-feature strategy, but with specific naming conventions.

State Branching

Unlike the strategies up to this point, state branching introduces the idea of a location or snapshot for some of the branches. Often our deployment diagrams are overly simplified and suggest that code moves between environments (Figure 3-5), but generally this isn’t really how it happens. Instead, Figure 3-6 shows the code is merged from one branch to another, and each of the branches is deployed to a specific environment. (Yes, we’ll talk about tagged releases later. Patience, grasshopper.) As Figure 3-6 shows, there’s often a mismatch between the branch names that are used and the name of the environment we are deploying to. (What does master mean? Is it for production? For development? Are you sure?) This strategy was described as the GitLab Flow model.

The simplified progression showing a single arrow between servers representing each of Local, Dev, Release Prep, and Production.
Figure 3-5. Deployment lies: code doesn’t really walk from the local server to the production server
The server for local, dev, release prep, and production now represented through the intermediate step of a code hosting system.
Figure 3-6. The real deployment process uses a centralized code hosting system

Through branch naming conventions, GitLab Flow makes it clear what code is going to be used in what environment, and therefore what conditions might need to be met before merging in commits. For example, you would clearly not merge untested code into a branch named production. Alternatively, if you are shipping code to “the outside world,” GitLab Flow suggests having release branches. Ideally, these release branches should follow semantic versioning conventions, although GitLab Flow does not explicitly require it.

Know When to Increment with Semantic Versioning

In semantic versioning, a release should always be numbered as follows: MAJOR.MINOR.PATCH. The first number (MAJOR) should be incremented when you make API-level changes that are not backward compatible. The second number (MINOR) should be incremented when you add new functionality that does not break existing functionality (it is backward compatible). The third number (PATCH) should be incremented when you make backward-compatible bug fixes.

An interesting variation on the state branching strategy is the branch naming convention that the Git project uses. It has four named integration branches:

maint

This branch contains code from the most recent stable release of Git as well as additional commits for point releases (maintenance).

master

This branch contains the commits that should go into the next release.

next

This branch is intended to test topics that are being considered for stability in the master branch.

pu

The proposed updates branch contains commits that are not quite ready for inclusion.

The branches work much like a stacked pyramid. Each of the “lower” branches contain commits that are not present in the “higher” branches. As is shown in Figure 3-7, maint has the fewest commits, and pu has the most commits. Once code has passed through the review process, it is incorporated into the next integration branch, getting closer to being incorporated into an official release.

Four rows of commit blobs representing maintenance, master, next, and proposed update branches. The branches contain increasing commits from top to bottom.
Figure 3-7. Integration branches used by the Git project

There are several advantages to using a state branching strategy:

  • Branch names are context specific and completely relevant to the work at hand.

  • There is no guessing about the purpose of each branch, making it easier for people to select the right branch when merging their work.

There are also disadvantages to using a state branching strategy:

  • It’s not always obvious where to start a branch from without guidance.

  • Because the branch names are extremely specific to the context of that team, it can be harder to get consistency across projects, making onboarding more difficult.

Left to my own devices, I typically end up with this style of branching for my own projects. I like using words that mean something to me instead of terms that meant something to someone else on some other team. Pedants, unite! Unless you prefer your own word. ;)

Scheduled Deployment

Scheduled deployment branching is the most appropriate strategy to use if you do not have a completely automated test suite, and in any situation where you must schedule a deployment. This may be because you have deployment windows (for example, never after 4PM, and never on a Friday); or an additional regulatory gate you need to pass through (for example, iOS applications being deployed to the App Store). As soon as you involve humans in a review process, or someone else’s arbitrary constraints on your deployment process, there will inevitably be delays somewhere, and you will need a way to suspend your work while you wait for the humans.

Through the different types of branching strategies, we have been adding an increasing amount of complexity to the branching that takes place in a repository. We started with just one branch, and then we added features and an integration branch. In a scheduled deployment, we add to this again. However, scheduled deployments can get quite complex in their branching patterns. They should be built up over time, and only as the complexity is warranted.

In this section, I will walk you through the progression of how the GitFlow branching strategy can be implemented by a team. GitFlow, the most popular implementation of this strategy, was first described by Vincent Driessen. It has been used by countless teams around the world to structure software projects. It can look very complex when it is presented in its final form. Fortunately, though, software projects build up to this point; they don’t start out this way. If there are any parts of the GitFlow that which are not relevant for your team, you can omit them from your project.

Let’s walk through the model together.

At first your software project has a single branch, develop. From this branch, your programmers create a diverging branch and add their features. Figure 3-8 shows that at this point, the diagram of GitFlow looks very similar to the previous models described in this chapter. In this case I will use the term “features” very broadly. A feature could actually be a bug fix, a refactoring, or indeed a completely new feature. Ideally when you’re working with a team, a feature will be described in a ticket before you start your work, and the branch name will resemble the ticket name. For example, if you had a ticket “1234” that was a bug report to fix a broken link, and you were using the convention [ticket_id]-[terse_title], your branch name would be 1234-fixing_links.

One row of commit blobs with several branches arching away representing features.
Figure 3-8. Development and feature branches used in GitFlow

Your team works and works and works and then you get to a point where you say “No new features!” We’ll often refer to this as feature freeze. At this point, a new branch is created from the development branch, as shown in Figure 3-9, and the only things that can be committed to this branch are bug fixes. These bugs may include regressions in performance, security flaws, and other general bits and bobs that are now broken. In more traditional Waterfall team structures, this bug-fixing period would be led by a quality assurance team. In a more Agile team, a developer would follow the issues through the series of branches to deployment, and would even be responsible for testing the work of others. We’ll talk more about the review process in Chapter 8.

The develop branch has forked into two with the release branch incorporating only bug fixes.
Figure 3-9. Feature freeze in GitFlow; only bug fixes are allowed

Perhaps not all features were completed when the feature freeze happened, so there is still work being committed to the develop branch. And if bugs are reported, these bugs need to be incorporated “backward” into the develop branch as well. Figure 3-10 shows our first view of a branching diagram with code being merged in two different directions. The longer your quality assurance period, the more likely you are going to have work happening both on the develop branch and also on the release branch.

The develop branch has forked into two with the release branch incorporating only bug fixes; but the bug fixes are now also rolled into the develop branch.
Figure 3-10. Development continues, but is not incorporated into the release branch

After an amount of time in testing, it will be declared that all bugs have been found, and what remains is ready to be deployed. Congratulations! At this point, all code that has passed quality assurance testing is committed to a new branch, master, which is then tagged (like a bookmark) with the version of the software at that point. The software is then deployed as shown in Figure 3-11. Your project manager gives you a heart-shaped candy, or maybe an animated GIF, and you get the rest of the day off. Good job, team! (If your project manager is not doing this, kindly send them my way and I’ll have a little chat with them on your behalf. We’re all friends here, it’s cool.)

Of course, reality dictates that sometimes bugs that need to be immediately fixed will sneak into the software. These hotfixes are so critical that a programmer should not go home for the evening before they are fixed. They are generally made by initiating a branch from the production branch, and when the hotfixes are released, they do not contain any additional work that has been happening since the last official release, as shown in Figure 3-12.

The release branch has forked to master and master has been tagged release 1.0.
Figure 3-11. Software is released by merging onto a new branch, master, with a tag

Define “Urgent” with Your Team

A developer I used to work with once told me that a bug could only be marked as a hotfix if he wasn’t allowed to go to the pub for a pint of beer before it was fixed. This radically changed my perception of what it meant for a problem to be marked as urgent. We recalibrated our definition of “urgent” and had fewer late nights as a result. In the same vein, I once worked with a client who was willing to mark tickets as “super very important, for later.” Have fun with your naming conventions where you can but make sure you document what they mean so you can avoid frustration of things not being completed in a timely manner.

We’ve slowly built up these branches as we needed different places for work to continue happening. You don’t need to create all of these branches to start. In fact, it’s better if you don’t, because it ends up being more code to maintain. Once you’ve got code in production, and code in development, you end up having a lot of wheels turning on your branching graph, as shown in Figure 3-12. This can be overwhelming for a newcomer, but it will be a natural progression for any developer who has worked on the project from the beginning. And if you choose to use this convention, it will also feel familiar to any new developer who has worked with this model previously.

The master branch has a hot fix committed and has been re-tagged release 1.0.1.
Figure 3-12. A hotfix is made, rolled into master, and our release tag is now 1.0.1

There are several advantages to using a scheduled deployment strategy:

  • Scheduled deployment does not require an extensive testing infrastructure to start using.

  • The process of building software, with phases for development, quality assurance, and production, is very common. This means GitFlow conventions will feel very familiar to software developers once they understand the process of how and where their typical tasks happen in the branching convention.

  • By adhering to conventions, developers should always be able to determine from which branch they should begin their work.

  • This is also a good model for versioned software, such as a product that you’d download from an app store where it is not appropriate to be deploying a new version every few days.

There are disadvantages to using a scheduled deployment branching strategy as well:

  • There is a lot of cognitive overhead for developers who are new to software deployment and haven’t experienced the process of walking a product through each phase of development.

  • If developers start their work from the wrong branch, it can be squirrelly to get everything back in sync.

  • It’s not as trendy as continuous deployment.

The scheduled deployment strategy offers the most rigid conventions about how code should be moved through the review gates. It is typically used when there is little to no automation for code review, and it is always present in some form for projects that are not using an automatic deployment scheme. Any time work is collated before being released, you will have at least some of the characteristics described in this section.

Updating Branches

This chapter has focused on common strategies used to isolate and merge streams of work. The strategies have focused on a single best-path scenario where branches of work are magically kept up to date with all relevant work happening elsewhere. In a distributed version control system the way you incorporate external work is independent of the branching strategy that you’ve chosen. When updating a branch, you can choose from one of two strategies: merging or rebasing. Before diving into the differences in these two strategies, let’s take a quick look at how connections are maintained between multiple repositories.

Every Git repository is an autonomous record of changes. Connections can be made between repositories by establishing a remote reference. This reference allows a developer to copy a record of all commit objects made in the remote repository to his or her local repository. Remote connections are typically made to repositories with at least a partially shared history. For example, the initial download of a repository using the command clone would result in a duplicate copy of the remote repository and its commit objects.

Let’s say, for example, you wanted to add your work to your coworker’s branch. You make a connection to their remote repository, fetch their branch, and try to add your work. But you can’t! If it were a local branch, you could add a few new commit objects to the tip of the branch. However, because it is a remote branch you want to update, you cannot assign a new commit object as the tip of the branch in your repository because this can only be done by the owner of the remote repository. Instead, you must first create a new tracking branch to store your changes.

Some Tracking Branches are Automatic

By default the command clone will create a tracking branch named master that is identical to the remote branch of the same name.

So now you have a local copy of a branch which you can add new commits to, a reference copy of the branch which you cannot add commits to, and the original branch still exists in the remote repository. Inevitably these branches will get out of sync as you and your coworker make changes to your respective repositories. Remember when you update your local repository you have two branches you need to update. On its own the command fetch will update the reference copy of the branch, downloading any new commits. Your mutable tracking copy of the branch, however, can be updated in more than one way. The is because you are now merging two branches into one, an action for which there are multiple strategies in Git. And where there is choice, there is potential for disagreement on which method should be used.

The process of updating your tracking branch from its remote reference will typically be achieved by using the command pull. However, pull is a combination of two discrete steps: fetch and merge or fetch and rebase. By default the command pull uses the merge strategy to update the local branch; however, by adding the parameter --rebase, a developer can opt to bring his or her local branch up to date using a rebase strategy instead.

All Your Rebase Are Belong to Us

Rebasing can be used to update a sequence of commits in one of two ways. First, as an alternate method to merging when incorporate new work from a related branch (bringing a branch up to date). Second, to alter history on the existing branch by adding, changing, or removing individual commits in the branch’s history of commits to make it a more concise history. This section refers to the former use of the term.

Rebasing has earned its reputation for being complicated and frustrating. But from a graphing perspective, rebasing is actually the easiest strategy to read. Figure 3-13 shows two branches before and after rebasing one branch onto another. Typically, we explain rebasing as replaying existing commits onto an existing time line. This analogy, although technically incorrect, works extremely well as a mental model for understanding the difference between merge and rebase.

While the command rebase is used to bring a branch up to date, the command merge is used to introduce completely new work. When the command merge is used with the fast-forward strategy the resulting graph is virtually identical to the output of a rebased branch. This fast-forward merging only works if the branch receiving the merge contains only commits that are included in the incoming branch. As Figure 3-14 shows, the graph for a fast-forward merge is as clean as rebasing.

Two diverging branches are redrawn so that the diverging branch appears as though it diverted later than it really did, thus incorporating more changes from the other branch.
Figure 3-13. Rebasing two branches changes the history of one branch so that it appears as though the other branch was always in place

When there is new work on both branches, and you want to combine the work, you will need to store the combined work in a new commit. Several different merge strategies can be applied, and Git will choose the best one for your particular situation. If you’re really curious about the different merge strategies, the Git help pages for merging can tell you how an octopus and a recursive merge are different. To read the documentation, run the command git help merge.

Need Help Choosing Between Merge and Rebase?

The graphed output is virtually identical for two branches which have been combined using either merge with fast forward or rebase. This can make it confusing to know which one should be used at what point. So confusing, in fact, that some teams choose to use the commands interchangeably! If you invest a little time in understanding when to use which strategy you will have agility in using different branching strategies for different projects you may work on. Merge or Rebase? includes a decision tree diagram to help you identify when you should be using each of the two strategies.

Two diverging branches are redrawn into a single line of commits which include all commits from both branches; however, only one branch had new commits on it.
Figure 3-14. Merging two branches using the fast-forward strategy is as clean as rebasing

If you are merging to bring your work up to date, the graphed history can get quite difficult to read as the connections become bidirectional. In other words, history swerves between the two branches as the code is brought up to date and new features are published into the main branch. Figure 3-15 shows how a merge keeps a historical record of where something came from. This is great if you’re incorporating a feature branch into the main development branch for your project, but it can be quite confusing if you’re trying to read the history of only the current features because the main development branch will now be spaghettied into your history graph, with merged connections being drawn from both the feature branch and the integration branch.

As a result of this synchronization issue, developers using Git typically don’t work on the tracking branch when they are planning to submit their work back to a project. Instead, a developer will make a fourth copy of the branch (a copy of the tracking branch which is a copy of the reference branch which is a copy of the remote branch). Regardless of the branching strategy, a tracking branch generally maps onto any long-running branch (e.g., master, or a release branch), and the working branch is a feature, ticket, or hotfix branch.

Rebasing a branch to bring it up to date makes history easier to read by simplifying the graph. Rebasing does, however, come at a cost especially if your copy of the branch contains commit objects you have created. In order to rebase a branch that has its own unique commits, you must replay each of your commits onto the new branch tip—assigning each commit a completely new identifier in Git as it is assigned a new parent. This can cause confusion if the commit that is assigned a new parent was one that had previously be shared in other remote repositories. In addition to the new identifiers, each time you replay a commit, there is a potential for a merge conflict, and conflicts are time consuming to deal with. It’s a little like keeping timesheets: so long as you invest a little time each day to keep your timesheets up to date, they’re no big deal. But if you’re really bad at remembering to make entries on your timesheets each day, it can be time consuming to try and catch up. The reward for maintaining an up to date branch through a rebasing strategy is an easy-to-read branch history. But is it worth it? It can cost novice Git users a fair amount of confidence if they are not entirely comfortable resolving merge conflicts.

Your homework is to talk with your team about which is more important: ease of use (choose merging to bring branches up to date), or an easier-to-read historical graph (choose rebasing to bring branches up to date).

Two diverging branches have commits which criss-cross in both directions.
Figure 3-15. Merging two branches without the fast-forward strategy

Summary

If you are working with a Git hosting system, such as GitHub, Bitbucket, or GitLab, a branch might be used to separate the work being done for a particular bug or feature ticket. Depending on your branching strategy, your goal may be to keep the branches separate indefinitely, or you may want to merge the branches every so often to combine the work that has been done separately into one deployable branch. Even though all of the information is stored in the repository, only one branch is ever visible at a time. The checked-out branch is visible in the working directory. So if you have two ideas that you’ve been working on and you want them both to be present on your server, you’ll need to merge the two branches into a common branch so that they can both appear at once.

This chapter covered several branching strategies that you can use with Git, along with variations within these strategies that have been used by some teams:

  • Mainline development

  • Branch-per-feature deployment

  • State branching

  • Scheduled deployment

In addition to these strategies, you will also need to decide how your team will incorporate new work into shared branches; and keep branches up to date. For very novice teams, there is not always an obvious answer to how branches should be kept up to date. Two strategies were offered: rebasing or merging. A rebasing strategy can be more difficult especially if it is not performed regularly; however, it does give your history a cleaner graph that is easier to review. By using merges to keep your branch up to date, the history of your project will be more difficult to review. So if the origin of how your work came to be doesn’t matter, you can choose either strategy, but if you will be reviewing the history often, rebasing will make future work easier (even though it can be more time consuming in the moment).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset