Even the best review processes will sometimes allow a bug into production. Perhaps the bug was introduced by a bad merge, or a scenario your tests didn’t cover. Whatever the cause of the problem, Git will be able to help you uncover at what point, and by whom, the offending code was introduced. This will allow you to understand the context of how the code ended up in the system, and tell you who the best person is to help you unpack a problem in an area of the code base you might not be familiar with.
There are two main ways to apply your forensic investigating skills: use the existing code to locate the problem and use the history of the code to locate the problem. You will be most effective when you use both of these techniques. When I’m debugging code, for example, I almost always start by looking at the code itself. This is left over from all of the frontend web development I’ve done, where it’s easiest to use a tool like Firebug to pick apart a web page to find the offending CSS. It’s definitely not the only way to debug code—and for many projects it will not be a viable strategy.
In this chapter, you will learn how to:
Set aside your current work with stash
so you can check out another branch
Find the history of a file with blame
Find the last working commit with bisect
By the end of this chapter, you will also have a better understanding of how you store history in Git now will affect how you can recover from mistakes tomorrow. You will hopefully have a new appreciation for how useful a really great commit message can be, and see how a rebasing workflow can help you create a history that is easier to decipher with bisect
. This chapter does not include instructions on how you undo mistakes you find, because that was covered in Chapter 6.
Those who learn best by following along with video tutorials will benefit from Collaborating with Git (O’Reilly), the companion video series for this book.
In Chapter 6, you learned how to adjust commit messages, but in cases of emergency, it may actually be more appropriate to put your work on hold temporarily. This can be accomplished with the command stash
. This command allows you to temporarily put aside something you are in the middle of, and which you want to return to at some point in the future.
One of my favorite Git-related one-liners was dropped by a friend, Jeff Eaton, at DrupalCon Prague. He made a comment, at exactly the right moment, about “having a git stash for morality.” I wish I could remember the context now (horror movies? beer gardens?), but the one-liner itself has stuck with me.
In the code sense of the command, stash
allows you to avoid useless commits that need to be undone later. These useless commits are often introduced if you are currently working on a problem, but need to switch to a different branch temporarily because you can only switch branches when you have a clean working directory. Unlike a branch, or an individual commit, a stash cannot be shared; it is specific to your local repository.
To create a new stash that holds the changes currently in your working directory, you need to issue the command stash
. If you prefer the clarity, you can include the parameter save
. It is implied, though, so you don’t need to include it if you want to save a few keystrokes:
$ git stash save Saved working directory and index state WIP on master: d7fe997 [9387] Adding test: check user exists HEAD is now at d7fe997 [9387] Adding test: check user exists
You’ll notice this command will only stash files Git already knows about. If you have new files that have not been committed previously, these files will not be incorporated into the stash as the other changes are tucked into a stash—making it impossible for you to switch to a different branch until all untracked changes have been cleaned up. To include untracked files, add the parameter --include-untracked
:
$ git stash save --include-untracked
Alternatively, if you want to throw out those new files instead of putting them into your stash, you can run the commands as follows:
$ git stash save $ git clean -d
Each time you issue the command stash
in a dirty working directory, a new stash will be created. You can see a list of your saved stashes by adding the parameter list
:
$ git stash list stash@{0}: WIP on master: d7fe997 [9387] Adding test: check user exists
If you only need to remember one stash, and only for a few minutes, this is probably okay. Your short-term memory may be able to retain exactly what happened to you a minute ago, but the longer you need to hold this memory, and the more memories you need to recall, the harder it’s going to be to remember what is in each stash.
To see the contents of a stash, use the command show
. The patch for the selected stash will be displayed including meta data and the stashed changes from your working directory:
$ git show stash@{0}
If you don’t think you will remember what you were working on from looking at the code, you can replace the commit message with a terse description of what you were working on when you stashed your working directory.
If you want to include a description, you will need to explicitly include the parameter save
.
Git allows you to store multiple stashes, so it can be especially helpful to name your stashes if you are working on a large problem and end up creating a stash multiple times from the same branch:
$ git stash save --include-untracked "terse description of the stashed work"
Now if you check your list of stashes again, you will see your previous stash as well as the new stash:
$ git stash list stash@{0}: On master: terse description of the stashed work stash@{1}: WIP on master: d79e997 Revert "Merge branch 'video-lessons' ...
The newest stash will appear at the top of the list. Notice how the numbers used to refer to the stashes change as you create more stashes—it’s a variable assignment, not a permanent reference number. This can be a little confusing if you create multiple stashes in the same branch—but if you give each stash a terse description, it can be easier to recall which stash you want to apply when you’re ready to get back to work, and which stashes are now old and ready to be deleted.
This command can also be used if you realize you are working in the wrong branch, but have not made any commits yet. You can stash your work, switch branches, and then reapply the work you brought with you in your stash.
Once you’re ready to return to work, you determine which stash you’d like to use, and then apply it:
$ git stash list $ git stash apply stash@{0}
If you use the command apply
, the stash will persist. This can be a little confusing if you start hoarding stashes. To remove a stash, use the command drop
to delete it:
$ git stash drop stash@{0}
If you know you’re a bit of a hoarder, and you think you might not be very good at cleaning up old stashes, you should use apply
and drop
the stash with the single command, pop
. Assuming you have only one stash, the command is as follows:
$ git stash pop
You can also pop
off specific stashes using the same structure as apply
and drop
:
$ git stash pop stash@{0}
If you have only one stash stored, you don’t need to list the stash you want to work with. If you omit the name of the stash, and there is more than one, Git will use the most recent stash (the top one on the list; it will be named stash@{0}
).
You should now be able to put your work on hold temporarily using the command stash
. Although you can stash your work whenever you’d like, you should only use this command if you are truly interrupted. If you have a coherent unit of work completed, use commit
instead. If you decide to add more work later, you can always choose to rebase
your branch and combine the commits you’d made previously.
One of the most basic tools you can use to start the search for why code isn’t working is to compare the broken code to another instance of the code. You can do this easily by working with relative history. Instead of reading through the log for a particular branch, you can compare a branch to another branch, or to another point in time.
Most of these commands have appeared previously, but this time, look at them with specific questions in mind. Consider the commit history graph in Figure 9-1. There are two branches with a common history: one with a known bug and one that is known to be working. The branch with the non-working code has four commits that differ from the branch branch with the code that works. The working branch only has two new commits, which are not included in the broken branch.
If you want to try the following exercises, download a copy of the repository from the Git for Teams website. This repository has the necessary branches set up so that you don’t need to replicate the scenario.
Using the command log
, you can isolate many pieces of history. Draw the diagram in a notebook, and create circles around commits each of the commands are showing. You can also try each of these commands with diff
instead of log
for a variation on the output.
On the current branch, this is how I would view everything except the most recently committed work:
$ git log HEAD^
On the current branch, this is how I would view everything except the three most recent commits:
$ git log HEAD~3
You can also make comparisons as if you were standing at different vantage points. You’re standing at the window of a tall building, looking out onto the street. You can see the rooftops of other, shorter buildings. Now imagine you’re standing on the street looking up at the tall building. You can see people sitting under the café umbrellas. In the context of Git, this means you can make comparisons using either branch as the vantage point:
$ git log since_last_merge_to..what's_been_added_here --oneline
For example, this is how I would see what’s in the working branch; but not on the broken branch:
$ git log working..broken
What about the opposite? How would I show which commits are in the broken branch, but missing from the working branch? Like this:
$ git log working..broken
If I wanted to see the code that was included in the broken branch, but missing in the working branch, I would do this:
$ git difftool working..broken
You can also make these comparisons with remote branches. Don’t forget to download the latest versions with fetch
before making the comparisons:
$ git fetch $ git log working..remote_nickname/broken
If you aren’t able to uncover sufficient information, you can use log
with the parameter -S
to search for a specific string of text with the commit message, or the text that was applied (or removed) as part of that commited change. Searching through your repository in this way is made significantly more useful if you use controlled vocabularies for your commit messages. For example, I always try to include the name of the file, or an equivalent shorthand, in the commit message so that I can easily filter on it later (when this file is added to the repository for the book, it will get a commit message which includes the text CH09
):
$ git log -S foo
If you were excited by the parameter -S
, have I got news for you! There is also the ability to search based on regular expressions. Use the parameter -G
.
Using these commands should help you to isolate which files might be causing the problems. Once you have the filenames, you can investigate them more closely.
When working with teams, it can be very useful to see who has worked on a file over time. The people working on files are the ones best equipped to walk through the history of why something was changed—especially if the commit messages aren’t giving any additional clues. Normally we use the command log
to reveal how a repository has changed over time, but this doesn’t give a very good overview of how all of those changes have come together to make the file you are currently looking at.
The command blame
allows you to look at a file line by line, showing the last time each line was changed, by whom, and in which commit it was changed (Figure 9-2).
To examine the file README.md, use the blame
command as shown in Example 9-1.
$ git blame README.md 3e9dd558 (emmajane 2014-04-23 22:11:40 -0400 1) Git for Teams of One... ^00de359 (Emma Jane 2014-04-23 18:54:03 -0700 2) ===================== ^00de359 (Emma Jane 2014-04-23 18:54:03 -0700 3) 3e9dd558 (emmajane 2014-04-23 22:11:40 -0400 4) Supporting files for ... 7874193c (emmajane 2014-06-26 00:37:41 -0400 5) developer work flow for ... 3e9dd558 (emmajane 2014-04-23 22:11:40 -0400 6) version control system, git 3e9dd558 (emmajane 2014-04-23 22:11:40 -0400 7) 00000000 (Not Committed Yet 2015-01-15 21:08:09 +0000 8) Test edit! 00000000 (Not Committed Yet 2015-01-15 21:08:09 +0000 9) 3e9dd558 (emmajane 2014-04-23 22:11:40 -0400 10) ## Contents 3e9dd558 (emmajane 2014-04-23 22:11:40 -0400 11) 5cc35764 (emmajane 2014-06-25 17:45:38 -0400 12) */slides* 3e9dd558 (emmajane 2014-04-23 22:11:40 -0400 13)
From left to right, the columns show:
Commit hash ID
Author name
Date
Line number
Content for that particular line within the file
In Example 9-1, you may have noticed there were three authors listed: Not Committed Yet, emmajane, and Emma Jane. Hopefully the first is self-explanatory: these are changes that are in my working directory but that are not yet committed. The two variations of my name are a simple inconsistency in how I’ve configured Git over time. You can read more about how to customize your attributed name in Appendix C.
Two of the lines begin with ^
. These lines have not been edited since the initial commit.
The command blame
is poorly named. It immediately, and unnecessarily, creates an antagonistic view of the code. I much prefer the commands used in one of Git’s competitors, Bazaar: annotate
, also available under the alias praise
. (Full disclosure, Bazaar also has an alias of blame
for annotate
.) Git does have an annotate
command, but the documentation for this command states that it is only for compatibility reasons. It is not a true alias and the output of blame
and annotate
differs slightly.
The last person who changed a line of code is often the person most qualified to explain what they were trying to accomplish; coming to them with a fight on your hands is going to decrease the likelihood they’ll come to you for help in the future, which increases the chance of you needing to deal with their future mistakes as well. Check your attitude when using this command, and see if you can shift from blame thinking to simple annotation.
Once you’ve located the line in the file that looks interesting, you can investigate further using the commit ID along with the commands log
, diff
, and show
. Table 9-1 outlines what each of the commands can help you to isolate.
Description | Command |
---|---|
Show the metadata for a particular commit |
|
Show the code changed in a particular commit |
|
Show the code changed since a particular commit |
|
Start by using the command log
to look at the commit message:
$ git show <commit>
If the commit message was well written, it should give you an explanation for why the changes were made in this particular commit. If the detailed commit message includes a reference back to a ticket number in your project management system, you may even be able to read a discussion for the changes made—giving you even more insight into what the developers were thinking when they created the fix. In the tracking system, you may also see other developers who were involved, and anyone who was on the review team for this particular change.
To see the same amount of detail, but in all commits since that point, use the command log
as follows:
$ git log --patch <commit>
The parameter --patch
in this context shows you the changes between each of the commits, as opposed to the command diff
, which shows you the difference between the referenced commit, and the files in the working directory.
blame
is not perfect. If the bug was introduced in a line that is not present in the version of the file you are looking at, blame
will not be able to notify you about who last edited the file. So it is a good tool to use, but it is not magic.
Using a combination of blame
, log
, and diff
, you should now be able to review the history of a single file in the context of the total combined history of that file, and in the context of other changes made at the same time. Using the commit message, you may also be able to trace the rationale of why the changes were made. With a little bit of forensic investigation, you can turn your questioning of the author of the code into a productive conversation—instead of a Columbo-style interrogation.
Often it can be difficult to figure out exactly when a bug was introduced in your code if you don’t know which file is the problem. If the error message you are looking for is printed to the screen, it can be relatively easy to search through the files in your code base to locate the right file. Sometimes the error message will include the filename and line number where the problem occurred. In any of these cases, you can use the commands diff
, log
, and blame
to gain a better understanding of what has gone wrong. Sometimes the problem code does not leave sufficient clues in the error messages to use these tools. Introducing bisect
!
bisect
performs a binary search through past commits to help you find the commit where the code went from a known working state to a known broken state. Unlike a regular checkout
of a commit, bisect
continues to wander through your history (in a very methodical way!) until you have given it enough clues to identify which commit introduced the dysfunctional code. It’s sort of like a historical reenactment of what the developers have done in a code base. At each point in the bisect process, you can launch the product (compile the code; load it in a browser; install the app on your phone; whatever is appropriate for your code base) and determine whether the code at this moment in history was right, or wrong. Once you find the point where things went wrong, you can fix history at that exact moment. It’s like Back to the Future—and Git is your DeLorean.
To begin, you need to be in the top-level directory of your repository. This is the folder where the hidden .git folder resides. Begin the bisect process, and notify Git of one commit ID where the code is known to be good and one commit ID where the code is known to be bad (Example 9-2).
$ git bisect start $ git bisect good <commit-id> $ git bisect bad <commit-id>
Git will now proceed to check out a series of commits one at a time, looking for the commit where the code went from bad to good:
$ git bisect start $ git bisect bad c04f374 $ git bisect good 93b64fc Bisecting: 10 revisions left to test after this (roughly 4 steps) [0075f7eda67326f1746] Merge branch 'video-lessons' into integration_test
The repository is now in a detached HEAD
state. At this point, you need to confirm if the code is good or bad and report back your findings:
$ git bisect bad Bisecting: 5 revisions left to test after this (roughly 3 steps) [ed8056eb4b2aaf00e6d] Lesson 4: Adding details on using git config $ git bisect bad Bisecting: 2 revisions left to test after this (roughly 1 step) [c88a02babc42bb00a83] Lesson 4: Adding new lesson on configuring Git $ git bisect good Bisecting: 0 revisions left to test after this (roughly 1 step) [f1fa8e7e382f68c0558] Lesson 3: Extended descriptions for cloning a ... $ git bisect good ed8056eb4b2aaf00e6d is the first bad commit commit ed8056eb4b2aaf00e6d9d183f974ed612d6f10e6 Author: emmajane <[email protected]> Date: Sun Sep 7 12:50:58 2014 +0100 Lesson 4: Adding details on using git config Added commands to customize the following: - username (or real name, as you prefer) - email address - enable color helpers within the git messages Added a self-study piece on customizing your command prompt to include additional color and branch information. :040000 040000 e927a1263e6e23eb5237a363a20640f62349b27d 31bc6c57d6acd8de214a63a47914b32d6809a866 M lessons
The problem commit has been located. At this point, you are in a detached HEAD
state, but you also know which commit you need to come back to. To return to the tip of your branch, with the new information, use the subcommand reset
. This command can also be used at any point during the bisect process to abandon the search and return to the most recent commit on your branch:
$ git bisect reset
If you have not done a lot of programming, the binary search process can feel a bit like magic. (Really freaking cool magic, mind you.) If you want to remove some of the mystery, you can use the subcommand visualize
to show you the current status of the bisect process (Figure 9-3). The outer good and bad commits will be identified in the GUI you have configured for gitk
.
It is assumed that the current work is bad. So, you can’t go back and find when something is fixed—you need to go back and find where something broke. It can be very confusing if you try to find where a fix was introduced, although it is possible. You just need to remember to reverse the definitions of good and bad.
I will happily admit that I am a crime drama TV junkie, so the chapter on using Git for forensic investigation appeals to me greatly. In this chapter, you have been exposed to a few of the commands I include in my detective toolkit:
stash
allows you to set aside your current work so you can check out another branch.
blame
allows you to find the line-by-line history of a file.
bisect
allows you to search methodically through history to find the spot where things went wrong.
These tools, when paired with the information in Chapter 6 on recovering from mistakes, will help you dig into, and recover from, just about any crime scene you may end up investigating.