A Guided Tour of Mercurial
Here at Medallia we have recently switched from Subversion to Mercurial for some of our projects. While both are very good tools for managing a source code tree there is a significant difference in the philosophy between the two tools and the problems they are trying to solve. While I will not claim that Mercurial is perfect it has turned out to be a very significant improvement for us, mainly because of its support for tracking branches, which Subversion instead leaves in the (perhaps not so) capable hands of its users.
The following is a guide I wrote which we are using internally to get people up to speed on using Mercurial. While there are several other guides out there I have not found one which, in my opinion, explains well how to effectively work with named branches (probably because they were only relatively recently added to Mercurial). I also wanted a single document which explains everything necessary to start working in a clear and concise way. My hope is that this guide provides that.
Mercurial's main distinguishing feature as compared to a more traditional SCM system like Subversion is that it is fully distributed. In practice this means that there is no central repository, but rather every user has a complete copy of the repository, which is always part of a working copy. Thus, instead of committing changes to a central repository you instead commit changes to the local repository and only later transfer (push) these changes over the network to a public place.
Getting the repository
Getting a copy of the repository, or cloning it as it is called, is done with the clone command:
hg clone ssh://server//hg/repos my-repos
This will create a new directory, my-repos, that has a complete copy of the 'repos' repository.
Branches
In Mercurial everything is a branch, that is, every time you commit something you either create or extend a branch. The repository can be visualized as a graph where the nodes are the individual revisions (i.e. a snapshot of all the files in the repository at a certain point in time) and the links between them the changes that have to be done to get from one node to the other.
Every node (except the very first of course) will have at least one parent, and some, where a merge was done, will have two. A working copy will always correspond to a specific node (revision) and when you commit your changes a new node is created which has that node as its parent. A head is a node that has no children (think of Medusa).
For example, let us imagine that both Alice and Bob have cloned a repository from a public server and their working directories correspond to revision 1. They both make some changes and commit their files to their local repositories. Alice's pushes her changes to the public server first and sometime later Bob tries to do the same. Mercurial now tells him that doing so would create an extra head in the repository and asks if perhaps he instead meant to merge his changes.
That was indeed what Bob wanted and he first does a pull from the public repository. His local repository how has three nodes; 1, which both he and Alice started working from, 2, which contains Bob's changes, and 3, which has Alice's changes. Both 2 and 3 have 1 as their parent and they have no children, thus the repository has two heads.
What Bob wants to do is merge his changes with Alice's, so he types:
hg merge 3
What this means is to merge the changes from revision 3 with the revision in the working directory, which in Bob's case corresponds to revision 2. He then types:
hg commit -m 'merge'
In this commit a new node (revision 4) is created which has both 2 and 3 as its parents. Bob can now push this revision to the public repository, and when Alice later pulls she will get revision 4 which has Bob's changes.
Named branches
Mercurial also has the concept of named branches, which is basically just a special tag that is applied to a revision when it is committed. By typing:
hg branch my-branch
you tell Mercurial to attach the branch tag my-branch to any commits that you make (type 'hg branch' to see which branch tag you are currently using). You can get a list of all the branch tags in the repository by typing:
hg branches
Mercurial will list all the branch tags and also the highest revision number of each branch, also called the tip of the branch. This is important: it is the revision that Mercurial will use if you tell it to e.g. update your working directory to a given branch or to push a given branch, both of which you will be doing frequently.
Let us enlist Alice and Bob again to give us an example. As before, they both have cloned a public repository and are at revision 1. Alice types 'hg branch alice' while Bob types 'hg branch bob'. They both make some changes and commit them to their local repositories. Alice pushes her changes, and after a while Bob tries to do the same. As before, Mercurial asks if he instead meant to merge his changes, so he does a pull from the public repository.
His repository now has three revisions, but revision 2 has the branch tag 'bob' while revision 3 has 'alice'. Since Bob wants to continue working independently of Alice he does not want to merge; instead he uses the '-f' switch when pushing to tell Mercurial that he really wants to create a new branch. The public repository now has two heads, and when Alice does a pull she can see Bob's new branch.
Collaborating on branches
Let us continue the above example: Bob has his own branch that he is working on, but since he is a nice guy he also wants to help Alice with an update she requested. Since his own branch is in a bit of a mess at the moment he does not want to put any of that in Alice's very tidy branch. He makes sure he has committed all his own changes (i.e. typing 'hg status' does not print anything) and then types:
hg up -C alice
This changes his working directory over to Alice's branch; all his own changes are removed and his working directory is now an exact copy of the tip of Alice's branch. He can now happily do the requested change and then commit (Mercurial changes the active branch tag automatically, but you can type 'hg branch' to verify that it says alice). To push this so Alice can get his update Bob types:
hg push -r alice
Now only the changes to the alice branch are pushed to the public server. The next time Alice does a pull she will get Bob's update. Let us image that she had continued development in parallel with Bob and had committed several changes to her local repository, without pushing them to the public server. Let us say that the last version she pushed was 10 and that this is the revision that Bob made his changes to. Her local repository will now have a revision X which has 10 as its parent and contains Bob's changes; she also has revisions 11, 12 and 13 which are her own changes and their also inherit from 10. To integrate Bob's changes Alice first makes sure she has committed all her local changes and then types:
hg merge X hg commit -m 'merge in change from Bob'
Note that if there were any conflicts as a result of the 'hg merge' command Alice would have to resolve those before typing the commit command (more on this later).
Alice now has a new revision, 14, which has 13 and X as its parents. Note that I used X here instead of a revision number for Bob's change; the reason for this is explained in the next section.
First I would like to mention a few words of caution. A consequence of the way named branches work in Mercurial, which might be surprising at first, is that while normally each named branch that is still active will also be a head in the repository, when you merge changes from one named branch into another this is no longer the case. Only the branch which you merged into remains a head, and when you push the merge revision Mercurial will print '(-1 heads)'. If someone else later wants to continue development on the other branch they will have to use the '-f' flag to tell Mercurial to "reopen" the branch and it will print '(+1 heads)'.
Normally the fact that a head is removed when you merge two named branches is not a problem, however, there is one case where care needs to be taken. If, after pushing a branch merge revision, someone else commits a revision on the branch which is no longer a head, and the parent of that revision is before the revision which got merged then some revisions are now only found on the other branch. That is, some revisions which were originally committed on branch 'foo' are now only present on the branch 'bar' if 'foo' were merged into 'bar'. Thus, when pushing a revision which would create a new head one should always check if it is necessary to do a merge as well.
Revision numbers
Since Mercurial is a distributed system there is no coordination of how revision numbers are assigned; this means that both Alice and Bob can have a revision 2 that are actually very different; the reason Mercurial uses these simple revision numbers at all is just to save you some typing. However, always remember that they are local to a repository and that when talking to someone else you have to use the revision id. This is the hex string that Mercurial always prints next to the revision number, e.g. 42:a416abc8f0e1. This id is a 'fingerprint' of all the changes in a revision and people can thus come up with the same id for the same set of changes without any communication between them.
The revision id can always be used instead of the revision number, thus you can write:
hg up 42 hg up a416abc8f0e1
And both will do the same thing, however, only the last line is guaranteed to do the same thing no matter what repository you write it in.
Named branches and updating your working copy
A word of caution about updating your working copy from a repository containing named branches: always give the name of the branch you are working with when updating. E.g. always write:
hg up my-branch
Omitting 'my-branch' in the above command will give you a warning about the update spanning branches, and ask you to either do a merge or add the '-C' switch if you really want to switch branches. However, if you do this any uncommitted changes in your working directory will be removed, so beware!
You can use the following command to do a 'safe' branch switch, i.e. it will automatically use the '-C' switch only if it is safe to do so (i.e. no uncommitted local changes):
hg status | grep . || hg up -C "$@"
Also remember to specify the branch when pushing to a remote repository, although Mercurial will warn you if the push would create a new head in that repository.
Merge conflicts
When you use the 'hg merge' command Mercurial will do its best to integrate the changes, however, if two people make changes to the same lines of code then Mercurial cannot figure out how to handle this and will instead kindly ask you to have a look. There are several tools that will help you with dealing with conflicts and Mercurial can be configured to use the tool you want; if no such tool is installed both versions of the conflict will be inserted in the file with special marker lines around them.
Under OS X Mercurial will use FileMerge.app if Xcode is installed; this highly recommended. On Windows, KDiff3 can be used.
Looking at changes
Let us say you are going to do a code review and have been told that revision a416abc8f0e1 has the changes in question. Simply type:
hg export a416abc8f0e1
Mercurial will print all the changes that happened in that revision.
Adding and removing files
There are two separate issues here: the first is renaming / moving files, in which case you should use 'hg rename' or its alias 'hg mv'. Failure to do so means that Mercurial does not know that the new file is actually just the old file in a new place, and it will create unpleasant merge conflicts. Another option is to use the '-s' flag to addremove, e.g. write:
hg addremove -s 50
Mercurial will now try to detect if any new files are actually an old file that was moved.
The other case is where you actually deleted old files and then added unrelated new ones. In this case you can either use 'hg remove' and 'hg add' respectively, or the 'hg addremove' command which will automatically mark all missing files as deleted and all new files as added. Do make sure that you have actually deleted all the files you want to delete and do not have any spare files you do not want to check in lying around in your repository (use 'hg status' for this) before using this command.
In any case it is always a good idea to use the '-n' flag first, which tells Mercurial to only print the changes it would otherwise perform so you can verify them.