As you’re no doubt aware, Git and Mercurial are great at re-integrating divergent lines of development through merging. They have to be, since their design strongly encourages developers to commit changes in parallel in their own distributed environments. Eventually some or all of these commits have to be brought together into a shared graph, and merging and rebasing are two primary ways that let us do that. So which one do you use?
What does Merge or Rebase mean?
Let’s start by defining what merging and rebasing are.
Merging brings two lines of development together while preserving the ancestry of each commit history.
In contrast, rebasing unifies the lines of development by re-writing changes from the source branch so that they appear as children of the destination branch – effectively pretending that those commits were written on top of the destination branch all along.
Here’s a visual comparison between merging and rebasing a branch ‘feature/awesomestuff’ back to the master branch (click for full size):
So merging keeps the separate lines of development explicitly, while rebasing always ends up with a single linear path of development for both branches. But this rebase requires the commits on the source branch to be re-written, which changes their content and their SHAs. This has important ramifications which we’ll talk about below.
[An aside: merging in Git can sometimes result in a special case: the ‘fast forward merge’. This only applies if there are no commits in the destination branch which aren’t already in the source branch. Fast-forward merges create no merge commit and the result looks like a rebase, because the commits just move over to the destination branch – except no history re-writing is needed (we’ll talk about this re-writing in a second). You can turn fast-forward merges off in SourceTree so that a merge commit is always created if you want – check the ‘Create a commit’ option in the Merge dialog or set it globally in Preferences > Git.]
So, what are the pros and cons of merging and rebasing?
Pros and Cons
- Simple to use and understand.
- Maintains the original context of the source branch.
- The commits on the source branch remain separate from other branch commits, provided you don’t perform a fast-forward merge. This separation can be useful in the case of feature branches, where you might want to take a feature and merge it into another branch later.
- Existing commits on the source branch are unchanged and remain valid; it doesn’t matter if they’ve been shared with others.
- If the need to merge arises simply because multiple people are working on the same branch in parallel, the merges don’t serve any useful historic purpose and create clutter.
- Simplifies your history.
- Is the most intuitive and clutter-free way to combine commits from multiple developers in a shared branch
- Slightly more complex, especially under conflict conditions. Each commit is rebased in order, and a conflict will interrupt the process of rebasing multiple commits. With a conflict, you have to resolve the conflict in order to continue the rebase. SourceTree guides you through this process, but it can still become a bit more complicated.
- Rewriting of history has ramifications if you’ve previously pushed those commits elsewhere. In Mercurial, you simply cannot push commits that you later intend to rebase, because anyone pulling from the remote will get them. In Git, you may push commits you may want to rebase later (as a backup) but only if it’s to a remote branch that only you use. If anyone else checks out that branch and you later rebase it, it’s going to get very confusing.
Looking at the above pros/cons, it’s clear that it’s generally not a case of choosing between one or the other, but more a case of using each at the appropriate times.
To explore this further, let’s say you work in a development team with many committers, and that your team uses both shared branches as well as personal feature branches.
With shared branches, several people commit to the same branch, and they find that when pushing, they get an error indicating that someone else has pushed first. In this case, I would always recommend the ‘Pull with rebase’ approach. In other words, you’ll be pulling down other people’s changes and immediately rebasing your commits on top of these latest changes, allowing you to push the combined result back as a linear history. It’s important to note that your commits must not have been shared with others yet.
In SourceTree, you can do this in the Pull dialog:
You can also set this as the default behavior in your Preferences if you like:
By taking this approach, when developing in parallel on a shared branch, you and your colleagues can still create a linear history, which is much simpler to read than if each member merges whenever some commits are built in parallel.
The only time you shouldn’t rebase and should merge instead is if you’ve shared your outstanding commits already with someone else via another mechanism, e.g. you’ve pushed them to another public repository, or submitted a patch or pull request somewhere.
Now let’s take the case where you deliberately create a separate branch for a feature you’re developing, and for the sake of this example, you are the only person working on that feature branch. This approach is common with git-flow and hg-flow for example. This feature branch may take a while to complete, and you’ll only want to re-integrate it into other lines of development once you’re done. So how do you manage that? There are actually two separate issues here that we must address.
The final merge: When building a feature on a separate branch, you’re usually going to want to keep these commits together in order to illustrate that they are part of a cohesive line of development. Retaining this context allows you to identify the feature development easily, and potentially use it as a unit later, such as merging it again into a different branch, submitting it as a pull request to a different repository, and so on. Therefore, you’re going to want to merge rather than rebase when you complete your final re-integration, since merging gives you a single defined integration point for that feature branch and allows easy identification of the commits that it comprised.
Keeping the feature branch up to date: While you’re developing your feature branch, you may want to periodically keep it in sync with the branch which it will eventually be merged back into. For example, you may want to test that your new feature remains compatible with the evolving codebase well before you perform that final merge. There are two ways you can bring your feature branch up to date:
- Periodically merge from the (future) destination branch into your feature branch. This approach used to cause headaches in old systems like Subversion, but actually works fine in Git and Mercurial.
- Periodically rebase your feature branch onto the current state of the destination branch
The pros and cons of each are generally similar to those for merging and rebasing. Rebasing keeps things tidier, making your feature branch appear quite compact. If you use the merge approach instead, it means that your feature branch will always branch off from its original base commit, which might have happened quite a long time ago. If your entire team did this and there’s a lot of activity, your commit history would contain a lot of parallel feature branches over a long period of time. Rebasing continually compacts each feature branch into a smaller space by moving its base commit to more recent history, cleaning up your commit graph.
The downside of rebasing your feature branches in order to keep them up to date is that this approach rewrites history. If you never push these branches outside your development machine, this is no problem. But assuming that you do want to push them somewhere, say for backup or just visibility, then rebasing can cause issues. On Mercurial, it’s always a bad idea – you should never push branches you intend to rebase later. With git, you have a bit more flexibility. For example, you could push your feature branches to a different remote to keep them separate from the rest, or you could push your feature branches to your usual remote, as your development team is aware that these feature branches will likely be rewritten so they should not check them out from this remote.
The consensus that I come across most frequently is that both merge and rebase are worth using. The time to use either is entirely dependent on the situation, the experience of your team, and the specific DVCS you’re using.
- When multiple developers work on a shared branch, pull & rebase your outgoing commits to keep history cleaner (Git and Mercurial)
- To re-integrate a completed feature branch, use merge (and opt-out of fast-forward commits in Git)
- To bring a feature branch up to date with its base branch:
- Prefer rebasing your feature branch onto the latest base branch if:
- You haven’t pushed this branch anywhere yet, or
- You’re using Git, and you know for sure that other people will not have checked out your feature branch
- Otherwise, merge the latest base changes into your feature branch
- Prefer rebasing your feature branch onto the latest base branch if:
I hope this helps! Please let me know in the comments if you have any questions or suggestions.