Chapter 9. Finding and Fixing Bugs

Even the best review processes will sometimes allow a bug into production. Perhaps the bug was introduced by a bad merge, or a scenario your tests didn’t cover. Whatever the cause of the problem, Git will be able to help you uncover at what point, and by whom, the offending code was introduced. This will allow you to understand the context of how the code ended up in the system, and tell you who the best person is to help you unpack a problem in an area of the code base you might not be familiar with.

There are two main ways to apply your forensic investigating skills: use the existing code to locate the problem and use the history of the code to locate the problem. You will be most effective when you use both of these techniques. When I’m debugging code, for example, I almost always start by looking at the code itself. This is left over from all of the frontend web development I’ve done, where it’s easiest to use a tool like Firebug to pick apart a web page to find the offending CSS. It’s definitely not the only way to debug code—and for many projects it will not be a viable strategy.

In this chapter, you will learn how to:

  • Set aside your current work with stash so you can check out another branch

  • Find the history of a file with blame

  • Find the last working commit with bisect

By the end of this chapter, you will also have a better understanding of how you store history in Git now will affect how you can recover from mistakes tomorrow. You will hopefully have a new appreciation for how useful a really great commit message can be, and see how a rebasing workflow can help you create a history that is easier to decipher with bisect. This chapter does not include instructions on how you undo mistakes you find, because that was covered in Chapter 6.

Those who learn best by following along with video tutorials will benefit from Collaborating with Git (O’Reilly), the companion video series for this book.

Using stash to Work on an Emergency Bug Fix

In Chapter 6, you learned how to adjust commit messages, but in cases of emergency, it may actually be more appropriate to put your work on hold temporarily. This can be accomplished with the command stash. This command allows you to temporarily put aside something you are in the middle of, and which you want to return to at some point in the future.

Real-World Git Applications

One of my favorite Git-related one-liners was dropped by a friend, Jeff Eaton, at DrupalCon Prague. He made a comment, at exactly the right moment, about “having a git stash for morality.” I wish I could remember the context now (horror movies? beer gardens?), but the one-liner itself has stuck with me.

In the code sense of the command, stash allows you to avoid useless commits that need to be undone later. These useless commits are often introduced if you are currently working on a problem, but need to switch to a different branch temporarily because you can only switch branches when you have a clean working directory. Unlike a branch, or an individual commit, a stash cannot be shared; it is specific to your local repository.

To create a new stash that holds the changes currently in your working directory, you need to issue the command stash. If you prefer the clarity, you can include the parameter save. It is implied, though, so you don’t need to include it if you want to save a few keystrokes:

$ git stash save

Saved working directory and index state WIP on master: 
  d7fe997 [9387] Adding test: check user exists
HEAD is now at d7fe997 [9387] Adding test: check user exists

You’ll notice this command will only stash files Git already knows about. If you have new files that have not been committed previously, these files will not be incorporated into the stash as the other changes are tucked into a stash—making it impossible for you to switch to a different branch until all untracked changes have been cleaned up. To include untracked files, add the parameter --include-untracked:

$ git stash save --include-untracked

Alternatively, if you want to throw out those new files instead of putting them into your stash, you can run the commands as follows:

$ git stash save
$ git clean -d

Each time you issue the command stash in a dirty working directory, a new stash will be created. You can see a list of your saved stashes by adding the parameter list:

$ git stash list

stash@{0}: WIP on master: d7fe997 [9387] Adding test: check user exists

If you only need to remember one stash, and only for a few minutes, this is probably okay. Your short-term memory may be able to retain exactly what happened to you a minute ago, but the longer you need to hold this memory, and the more memories you need to recall, the harder it’s going to be to remember what is in each stash.

To see the contents of a stash, use the command show. The patch for the selected stash will be displayed including meta data and the stashed changes from your working directory:

$ git show stash@{0}

If you don’t think you will remember what you were working on from looking at the code, you can replace the commit message with a terse description of what you were working on when you stashed your working directory.

Adding a Description

If you want to include a description, you will need to explicitly include the parameter save.

Git allows you to store multiple stashes, so it can be especially helpful to name your stashes if you are working on a large problem and end up creating a stash multiple times from the same branch:

$ git stash save --include-untracked "terse description of the stashed work"

Now if you check your list of stashes again, you will see your previous stash as well as the new stash:

$ git stash list

stash@{0}: On master: terse description of the stashed work
stash@{1}: WIP on master: d79e997 Revert "Merge branch 'video-lessons' ...

The newest stash will appear at the top of the list. Notice how the numbers used to refer to the stashes change as you create more stashes—it’s a variable assignment, not a permanent reference number. This can be a little confusing if you create multiple stashes in the same branch—but if you give each stash a terse description, it can be easier to recall which stash you want to apply when you’re ready to get back to work, and which stashes are now old and ready to be deleted.

Stashed Work Can Be Applied to Any Branch

This command can also be used if you realize you are working in the wrong branch, but have not made any commits yet. You can stash your work, switch branches, and then reapply the work you brought with you in your stash.

Once you’re ready to return to work, you determine which stash you’d like to use, and then apply it:

$ git stash list
$ git stash apply stash@{0}

If you use the command apply, the stash will persist. This can be a little confusing if you start hoarding stashes. To remove a stash, use the command drop to delete it:

$ git stash drop stash@{0}

If you know you’re a bit of a hoarder, and you think you might not be very good at cleaning up old stashes, you should use apply and drop the stash with the single command, pop. Assuming you have only one stash, the command is as follows:

$ git stash pop

You can also pop off specific stashes using the same structure as apply and drop:

$ git stash pop stash@{0}

When in Doubt, Git Assumes You Meant the Latest Stash

If you have only one stash stored, you don’t need to list the stash you want to work with. If you omit the name of the stash, and there is more than one, Git will use the most recent stash (the top one on the list; it will be named stash@{0}).

You should now be able to put your work on hold temporarily using the command stash. Although you can stash your work whenever you’d like, you should only use this command if you are truly interrupted. If you have a coherent unit of work completed, use commit instead. If you decide to add more work later, you can always choose to rebase your branch and combine the commits you’d made previously.

Comparative Studies of Historical Records

One of the most basic tools you can use to start the search for why code isn’t working is to compare the broken code to another instance of the code. You can do this easily by working with relative history. Instead of reading through the log for a particular branch, you can compare a branch to another branch, or to another point in time.

Most of these commands have appeared previously, but this time, look at them with specific questions in mind. Consider the commit history graph in Figure 9-1. There are two branches with a common history: one with a known bug and one that is known to be working. The branch with the non-working code has four commits that differ from the branch branch with the code that works. The working branch only has two new commits, which are not included in the broken branch.

A working branch (two commits), a broken branch (four commits), and a common tail of three commits.
Figure 9-1. Two branches diverged from a common ancestor with an unequal number of commits

Need a Sample Repository to Practice On?

If you want to try the following exercises, download a copy of the repository from the Git for Teams website. This repository has the necessary branches set up so that you don’t need to replicate the scenario.

Using the command log, you can isolate many pieces of history. Draw the diagram in a notebook, and create circles around commits each of the commands are showing. You can also try each of these commands with diff instead of log for a variation on the output.

On the current branch, this is how I would view everything except the most recently committed work:

$ git log HEAD^

On the current branch, this is how I would view everything except the three most recent commits:

$ git log HEAD~3

You can also make comparisons as if you were standing at different vantage points. You’re standing at the window of a tall building, looking out onto the street. You can see the rooftops of other, shorter buildings. Now imagine you’re standing on the street looking up at the tall building. You can see people sitting under the café umbrellas. In the context of Git, this means you can make comparisons using either branch as the vantage point:

$ git log since_last_merge_to..what's_been_added_here --oneline

For example, this is how I would see what’s in the working branch; but not on the broken branch:

$ git log working..broken

What about the opposite? How would I show which commits are in the broken branch, but missing from the working branch? Like this:

$ git log working..broken

If I wanted to see the code that was included in the broken branch, but missing in the working branch, I would do this:

$ git difftool working..broken

You can also make these comparisons with remote branches. Don’t forget to download the latest versions with fetch before making the comparisons:

$ git fetch
$ git log working..remote_nickname/broken

If you aren’t able to uncover sufficient information, you can use log with the parameter -S to search for a specific string of text with the commit message, or the text that was applied (or removed) as part of that commited change. Searching through your repository in this way is made significantly more useful if you use controlled vocabularies for your commit messages. For example, I always try to include the name of the file, or an equivalent shorthand, in the commit message so that I can easily filter on it later (when this file is added to the repository for the book, it will get a commit message which includes the text CH09):

$ git log -S foo

If you were excited by the parameter -S, have I got news for you! There is also the ability to search based on regular expressions. Use the parameter -G.

Using these commands should help you to isolate which files might be causing the problems. Once you have the filenames, you can investigate them more closely.

Investigating File Ancestry with blame

When working with teams, it can be very useful to see who has worked on a file over time. The people working on files are the ones best equipped to walk through the history of why something was changed—especially if the commit messages aren’t giving any additional clues. Normally we use the command log to reveal how a repository has changed over time, but this doesn’t give a very good overview of how all of those changes have come together to make the file you are currently looking at.

The command blame allows you to look at a file line by line, showing the last time each line was changed, by whom, and in which commit it was changed (Figure 9-2).

List of commit IDs, and arrows pointing to lines in a file from a single commit ID -- showing these lines were introduced at that commit.
Figure 9-2. blame allows you to list when each line was introduced into a file, by its commit ID and author

To examine the file README.md, use the blame command as shown in Example 9-1.

Example 9-1. Output of the command blame
$ git blame README.md

3e9dd558 (emmajane          2014-04-23 22:11:40 -0400  1) Git for Teams of One...
^00de359 (Emma Jane         2014-04-23 18:54:03 -0700  2) =====================
^00de359 (Emma Jane         2014-04-23 18:54:03 -0700  3)
3e9dd558 (emmajane          2014-04-23 22:11:40 -0400  4) Supporting files for ...
7874193c (emmajane          2014-06-26 00:37:41 -0400  5) developer work flow for ...
3e9dd558 (emmajane          2014-04-23 22:11:40 -0400  6) version control system, git
3e9dd558 (emmajane          2014-04-23 22:11:40 -0400  7)
00000000 (Not Committed Yet 2015-01-15 21:08:09 +0000  8) Test edit!
00000000 (Not Committed Yet 2015-01-15 21:08:09 +0000  9)
3e9dd558 (emmajane          2014-04-23 22:11:40 -0400 10) ## Contents
3e9dd558 (emmajane          2014-04-23 22:11:40 -0400 11)
5cc35764 (emmajane          2014-06-25 17:45:38 -0400 12) */slides*
3e9dd558 (emmajane          2014-04-23 22:11:40 -0400 13)

From left to right, the columns show:

  • Commit hash ID

  • Author name

  • Date

  • Line number

  • Content for that particular line within the file

In Example 9-1, you may have noticed there were three authors listed: Not Committed Yet, emmajane, and Emma Jane. Hopefully the first is self-explanatory: these are changes that are in my working directory but that are not yet committed. The two variations of my name are a simple inconsistency in how I’ve configured Git over time. You can read more about how to customize your attributed name in Appendix C.

Two of the lines begin with ^. These lines have not been edited since the initial commit.

Beware! The Word “blame” May Condition You into Negative Thinking

The command blame is poorly named. It immediately, and unnecessarily, creates an antagonistic view of the code. I much prefer the commands used in one of Git’s competitors, Bazaar: annotate, also available under the alias praise. (Full disclosure, Bazaar also has an alias of blame for annotate.) Git does have an annotate command, but the documentation for this command states that it is only for compatibility reasons. It is not a true alias and the output of blame and annotate differs slightly.

The last person who changed a line of code is often the person most qualified to explain what they were trying to accomplish; coming to them with a fight on your hands is going to decrease the likelihood they’ll come to you for help in the future, which increases the chance of you needing to deal with their future mistakes as well. Check your attitude when using this command, and see if you can shift from blame thinking to simple annotation.

Once you’ve located the line in the file that looks interesting, you can investigate further using the commit ID along with the commands log, diff, and show. Table 9-1 outlines what each of the commands can help you to isolate.

Table 9-1. Reason to use log, diff, and show
Description Command

Show the metadata for a particular commit

log commit

Show the code changed in a particular commit

show commit

Show the code changed since a particular commit

diff commit

Start by using the command log to look at the commit message:

$ git show <commit>

If the commit message was well written, it should give you an explanation for why the changes were made in this particular commit. If the detailed commit message includes a reference back to a ticket number in your project management system, you may even be able to read a discussion for the changes made—giving you even more insight into what the developers were thinking when they created the fix. In the tracking system, you may also see other developers who were involved, and anyone who was on the review team for this particular change.

To see the same amount of detail, but in all commits since that point, use the command log as follows:

$ git log --patch <commit>

The parameter --patch in this context shows you the changes between each of the commits, as opposed to the command diff, which shows you the difference between the referenced commit, and the files in the working directory.

blame Only Tells You About What Is Visible

blame is not perfect. If the bug was introduced in a line that is not present in the version of the file you are looking at, blame will not be able to notify you about who last edited the file. So it is a good tool to use, but it is not magic.

Using a combination of blame, log, and diff, you should now be able to review the history of a single file in the context of the total combined history of that file, and in the context of other changes made at the same time. Using the commit message, you may also be able to trace the rationale of why the changes were made. With a little bit of forensic investigation, you can turn your questioning of the author of the code into a productive conversation—instead of a Columbo-style interrogation.

Historical Reenactment with bisect

Often it can be difficult to figure out exactly when a bug was introduced in your code if you don’t know which file is the problem. If the error message you are looking for is printed to the screen, it can be relatively easy to search through the files in your code base to locate the right file. Sometimes the error message will include the filename and line number where the problem occurred. In any of these cases, you can use the commands diff, log, and blame to gain a better understanding of what has gone wrong. Sometimes the problem code does not leave sufficient clues in the error messages to use these tools. Introducing bisect!

bisect performs a binary search through past commits to help you find the commit where the code went from a known working state to a known broken state. Unlike a regular checkout of a commit, bisect continues to wander through your history (in a very methodical way!) until you have given it enough clues to identify which commit introduced the dysfunctional code. It’s sort of like a historical reenactment of what the developers have done in a code base. At each point in the bisect process, you can launch the product (compile the code; load it in a browser; install the app on your phone; whatever is appropriate for your code base) and determine whether the code at this moment in history was right, or wrong. Once you find the point where things went wrong, you can fix history at that exact moment. It’s like Back to the Future—and Git is your DeLorean.

To begin, you need to be in the top-level directory of your repository. This is the folder where the hidden .git folder resides. Begin the bisect process, and notify Git of one commit ID where the code is known to be good and one commit ID where the code is known to be bad (Example 9-2).

Example 9-2. Identify good and bad commits to bisect
$ git bisect start
$ git bisect good <commit-id>
$ git bisect bad <commit-id>

Git will now proceed to check out a series of commits one at a time, looking for the commit where the code went from bad to good:

$ git bisect start
$ git bisect bad c04f374
$ git bisect good 93b64fc

Bisecting: 10 revisions left to test after this (roughly 4 steps)
[0075f7eda67326f1746] Merge branch 'video-lessons' into integration_test

The repository is now in a detached HEAD state. At this point, you need to confirm if the code is good or bad and report back your findings:

$ git bisect bad

Bisecting: 5 revisions left to test after this (roughly 3 steps)
[ed8056eb4b2aaf00e6d] Lesson 4: Adding details on using git config


$ git bisect bad

Bisecting: 2 revisions left to test after this (roughly 1 step)
[c88a02babc42bb00a83] Lesson 4: Adding new lesson on configuring Git


$ git bisect good

Bisecting: 0 revisions left to test after this (roughly 1 step)
[f1fa8e7e382f68c0558] Lesson 3: Extended descriptions for cloning a ...


$ git bisect good

ed8056eb4b2aaf00e6d is the first bad commit
commit ed8056eb4b2aaf00e6d9d183f974ed612d6f10e6
Author: emmajane <[email protected]>
Date:   Sun Sep 7 12:50:58 2014 +0100

    Lesson 4: Adding details on using git config

    Added commands to customize the following:

    - username (or real name, as you prefer)
    - email address
    - enable color helpers within the git messages

    Added a self-study piece on customizing your command prompt to include
    additional color and branch information.

:040000 040000 e927a1263e6e23eb5237a363a20640f62349b27d
31bc6c57d6acd8de214a63a47914b32d6809a866 M	lessons

The problem commit has been located. At this point, you are in a detached HEAD state, but you also know which commit you need to come back to. To return to the tip of your branch, with the new information, use the subcommand reset. This command can also be used at any point during the bisect process to abandon the search and return to the most recent commit on your branch:

$ git bisect reset

If you have not done a lot of programming, the binary search process can feel a bit like magic. (Really freaking cool magic, mind you.) If you want to remove some of the mystery, you can use the subcommand visualize to show you the current status of the bisect process (Figure 9-3). The outer good and bad commits will be identified in the GUI you have configured for gitk.

A screen shot of the gitk output for a bisect process. Two commits are identified as being uniquely 'good' and 'bad'.
Figure 9-3. Running git bisect visualize shows you the current status of the bisect process

Bisect Assumes Bad Things Have Happened

It is assumed that the current work is bad. So, you can’t go back and find when something is fixed—you need to go back and find where something broke. It can be very confusing if you try to find where a fix was introduced, although it is possible. You just need to remember to reverse the definitions of good and bad.

Summary

I will happily admit that I am a crime drama TV junkie, so the chapter on using Git for forensic investigation appeals to me greatly. In this chapter, you have been exposed to a few of the commands I include in my detective toolkit:

  • stash allows you to set aside your current work so you can check out another branch.

  • blame allows you to find the line-by-line history of a file.

  • bisect allows you to search methodically through history to find the spot where things went wrong.

These tools, when paired with the information in Chapter 6 on recovering from mistakes, will help you dig into, and recover from, just about any crime scene you may end up investigating.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset