Rewriting History with Git

By on Monday, June 20th, 2011 in Technical | Related Software Packages: | Keywords:

The Git versioning system allows you to manage large-scale distributed code development projects with thousands of parallel branches. Its powerful checkout, merge, push, and pull capabilities help you work with code branches and clones. Sometimes you, as a contributor, might like to polish a branch you’ve checked out and reduce the number of commits you made before merging it with the master branch. One of Git’s nicer features is that it lets you rewrite history.

Rewriting history is not about changing the end result, but about making things cleaner and clearer. For instance, you can remove commits from the times when you added and later removed debug outputs, or merge the fixes you made to a feature with the commit that added the actual feature. The overall goal is to simplify history for the person who merges your branch into the master branch.

In Git, each repository is a clone of your project’s entire history. This means that your local repository has the same information as the master branch, or any other developer’s branch. All repositories share the common history, and no branch, from Git’s point of view, represents the “true” history. Therefore you should never rewrite history after you’ve pushed or shared your branch. If you do, other branches will contain alternate versions of the same development history, and your merging could get messy.

The Rebase

One worthwhile technique for rewriting history, the rebase, changes the starting point of your branch to simplify merging your branch into the master branch. You may want to employ the rebase when you clone a master branch and work with it for a long time, making many commits. The rebase command changes the point where your branch diverges from the master branch. This aims to move the last commit closer in history, which in turn makes the process of merging your branch into the master branch easier.

Let’s look at an example. You start by cloning a branch from the master branch. You then modify a file – let’s call it README.txt. You make commits to it a number of times as you work. These commits end up in your local clone.

git clone url
git checkout -b name-of-branch
vim README.txt

While you work in your branch, other people are making changes to the same file in the master branch. They alter it in such a way that merge conflicts are bound to arise. That means you must decide whether to handle those conflicts in your branch or in the master branch. The answer depends on your project: If you are responsible for both merge operations, the difference is a matter of taste. If your branch is to be merged into the master branch by another developer, it might be preferable that you do the merging by rebasing. When rebasing, you handle all conflicts in your branch, which makes the merge of your branch into the master branch trivial.

The following list of commands, run from the working branch, shows the basic process:

git rebase master
vim README.txt
git add README.txt
git rebase --continue

When you issue the first rebase command above, the git command detects a conflict in the README.txt file. The conflict is marked out in the file, and you use an editor (in this case vim) to resolve the conflict. You then add the file to the git staging area and continue the rebase.

When all the conflicts have been manually resolved, the point in history where your branch diverges from the master branch has been moved to the latest commit in the master branch. This means that the task of merging the working branch into the master branch is painless – simply git merge name-of-branch from your master branch. As the branch being merged into the master branch starts from the tip of the master branch, the process simply adds more commits to the master branch.

Altering Commits

Another way to rewrite history is to alter commits. You can merge multiple related commits into one, remove irrelevant commits, or reorder commits. Having a cleaner history makes project development easier to follow – something other developers always appreciate.

One of my favorite Git features is the ability to commit changes to a local clone of a remote repository. At a later point, you can merge the clone back into the master branch. As you can commit locally, you can also go back in history, even if you’re not online. This encourages you to maintain a fine-grained history, regardless of whether you’re working at the office or traveling.

Committing often makes it easy to follow the development history, but it also means having quite a few intermediate commits, some of which sometimes cancel each other out. For instance, I often read commit messages that say things such as “Added trace outputs to foo,” closely followed by “Cleaned up trace outputs in sane parts of foo.” Other commits might be for bloat checks before the end of a working day. These can break up feature implementations into multiple commits, often with non-working intermediary stages.

To modify commits, you actually run an interactive rebase on your branch. That is, you replay the last commits, having the option to alter each commit as it is replayed. To initiate this process, you issue the command git rebase --interactive branch-point, where the branch point is the last commit before the part of the history that you want to rewrite. You can either pick a hash from the commit log as your branch point, or you can write, for example, HEAD~3 to modify the three last commits.

When you perform an interactive rebase, git displays a list of commits that you can reorder, edit, or skip. Reordering commits lets you make the change history clearer. As the rebase is performed, the commits are replayed, so you might face merge conflicts. Similarly, removing a commit from the list removes the entire commit, including any file changes, which also may lead to merge conflicts. The process of resolving the conflicts and resuming the rebase works in much the same way as rebasing a separate branch.

There are also a number of ways that you can alter a commit. For instance, you can reword the commit message, squash multiple commits into one, or edit a commit. Editing a commit lets you manually replay the commit, and perhaps split the commit into multiple commits.

Your Favorite Moments

In addition to letting you rearrange, rewrite, and generally alter history, Git also makes it possible to pick out your favorite changes using a process known as cherry-picking that lets you pick commits from other branches and apply them to your branch. This comes in handy when you need a specific bug fixed in a branch but don’t want to merge a complete branch into it.

Suppose you have tagged a release, say 1.0, and then solved a common bug in the master branch heading for version 2.0. By picking out the hash for the bug fix from the git log, you can cherry-pick it into your 1.0 branch with the simple command git cherry-pick hash. This can, of course, lead to merge conflicts which you have to resolve manually.

Considered Harmful?

Some developers consider the history rewriting features of Git to be harmful, as it potentially can lead to different versions of the same piece of history. Rewriting history in a Git repository is safe as long as you have not shared that part of the history with anyone else. However, you can wreak havoc by altering the history of a repository that someone else has cloned, or that you have pushed to a remote location. When Git has to deal with multiple competing versions of history, its decentralized nature makes it impossible to determine who is right.

Still, treated with care, Git’s ability to clean up and rewrite history lets you work more freely than you can with other, non-distributed versioning systems. It lets you commit frequently and thus build a project history that’s easy to follow. At the same time, you can reduce the noise for your fellow project members by cleaning up history when your work is ready to be pushed to the rest of the world.

Download the Open Source Support Evaluation Kit

Related posts:

  1. Migrate from SVN to Git easily with git-svn
  2. Git Tutorial: Branching and Merging
  3. Getting Started with Mercurial
  4. Creating a Continuous Integration Server for Java Projects Using Hudson
  5. How to Build a Distributed Monitoring Solution with Nagios

Related Open-Source Packages

Git: See all Git Articles » Get Git Support at OLEX »

Johan Thelin

Johan Thelin has worked as a developer since 1995 on systems from enterprise-scale servers to embedded Linux systems. He has written for numerous magazines, and authored the book Foundations of Qt Development.

2 Responses to “Rewriting History with Git”

  1. [...] git-svn is a neat, straightforward tool that allows you to run a Git repository locally, then sync back against a central Subversion repository. Obviously, this is useful if you want to try Git out, or if your workmates aren’t interested in switching but you prefer Git. But it’s also handy for anyone who regularly works offline – for example, when traveling. Git, unlike SVN, is a distributed version control system, which means that you have your own local copy of the repository. This, combined with the ease of branching (which we’ll talk about in a moment), means that you can keep track of your changes locally and incrementally, committing them to your own repository as often as you like, until you’re ready to commit the whole lot back to the main repository. By contrast, with a normal Subversion repository, it’s all or nothing; you can’t track any incremental changes that you can’t or don’t want to commit to the main repository – as you might wish to do, for example, with temporarily broken code on its way to a refactor. If you’ve ever got halfway through coding a feature change and started to feel nervous about how much you have untracked, you’ll appreciate this option. For more on Git’s advantages in this area, see the Wazi article Rewriting History with Git. [...]

  2. Juanje says:

    I had to clean some internal project’s history at my company (Emergya) before to publish the code and I had to use those commands, but I also found this documentation very useful:
    http://progit.org/book/ch6-4.html

    Sometimes you need to remove a insternal password for testing, or some sensitive data from your company or your client. In those cases I found very useful the command ‘filter-branch’. Also useful for changing bad mails and usernames in the commits at once.

    I hope this info be as useful for you as it was for me :-)

    Good job, by the way :-)

Leave a Reply

© 2012 OpenLogic, Inc. | Licensing | Privacy Policy | Terms of Use

Bad Behavior has blocked 2283 access attempts in the last 7 days.