Tags with Mercurial and Forests

When you have a project in a Mercurial repository, probably using the Forest extension (like I’m doing), then you most likely want to tag the forest occasionally, be it before a release, or (like in my case) when an autobuilder was successful and wants to mark a specific version as OK. Obviously, when working with a forest, there’s a problem, because there is not a single version to tag, but in fact a couple of version. Each tree needs to be tagged. My specific requirements for tagging are those:

  • A tag must identify a specific state of the source tree (forest).
  • It must be reliable. I don’t want to recover from arbitrary errors.
  • Setting and removing a tag must be atomic. No other operation should interfer with it.
  • It must be possible for multiple processes to set and remove tags on a repository.
  • The tags should be available through the VCS (I could simply store the stuff in a database myself, but that would partly defeat the purpose).

Here’s my solutions, in the order that I tried:

Using hg tag

This is the first thing that comes to mind, simply because there is such a command. Frankly, in hindsight, this seems the worst solution to the problem, at least with forests. I tried to write scripts that walk the forest and set tags in all the repositories. I even started an HG extension for that. Not only is this very complicated, it turns out that it is almost impossible to push back the tags to a master repository reliably, especially when you want to do this automatically. Also, this stores the tag in each of the repositories, introducing changesets in all of them, thus cluttering all the logs. Ugly. Keep away from that.

Using snapshots

The next solution is based on hg fsnap. This is an interesting command. Using that, you can create a snapshot of a forest, and later you can use fseed or fclone to re-create the forest at exactly this state. Seems like a reasonable candidate to implement tagging with. So I did. My script did create snapshots of a specific state, put them in their own repository inside the forest, and pushed them to the master. Since the tags are in their own repository, it is much easier to push them back. However, it is still not 100% fault-proof, especially when several autobuild-processes plus developers are involved. Much better than the first solution, but still, not recommended.

Using clones

Funnily, this idea comes from the Subversion development model. Instead of hg tag or hg snapshot, it is very straightforward to simply create a clone as a tag. Just like you do with branching. (Yes, I consider in-tree branches somewhat broken. or at least confusing.) Cloning inside the same filesystem has almost zero overhead, because HG uses hardlinks internally. This makes it easy to deploy a similar structure as is common in Subversion:




Trunk is where all the main development goes. Branches holds all the clones that are considered branches, be it separate release branches or some kind of development branch. You can do just the same with tags, simply clone the forest from trunk (or some branch) to the tags directory and don’t touch it anymore afterwards. This is very straightforward to implement and is the only solution (AFAICS) that fulfills all the requirements above. I’d even go so far to recommend this for non-forest repositories too instead of hg tag. I wish I had this idea earlier. This is why having all kind of commands in the core of HG and having them named like in other RCSs is probably not a good idea, it misleads to thinking that (for example) hg tag is just the same as cvs tag. It stole me a lot of time.


About Roman Kennke
JVM Hacker, Principal Software Engineer at Red Hat's OpenJDK team, Shenandoah GC project lead, Java Champion

2 Responses to Tags with Mercurial and Forests

  1. roman says:

    This is taken from a discussion on the mercurial mailing list, I hope this adds a little more light to the specific problems of the different approaches:

    >>To make it short, my solution is to implement tags just like branches:
    > >As simple (f)clones. To be honest, I think the hg tag command is not
    > >very useful and seems like something that doesn’t really fit. It should
    > >probably be an extension, not a core command. At least, using clones
    > >feels much more natural to HG and avoids many problems that I had with
    > >hg tag. I think that this approach should be recommended over hg tag, or
    > >do you generally disagree with that?
    > I’m a little bit curious about this, and in particular curious about the pitfalls of using snapshots.
    > It seems that you’ve come to a conclusion that in order to tag a forest you have a tag tree of fclone’d forests?
    > Doesn’t that mean you have a centralized single location for tags? I liked the snapshots idea because then the “tags” were as distributable as anything else. Perhaps I’m missing something? Could you explain the snapshots issues in more details?
    Sure. A little background first. I have a central forest on a server, to which the developers push their changes. I also have an autobuild infrastructure of several machines, which pull from this central repository and build the thing and set a tag when it’s ok (actually, it removes old ok tags and sets a new one). This is done for several target architectures and configuration of the software, so I end up with around 50 different ok tags (one for each possible combination).

    The problem with hg tag is that this creates a changeset, which then needs to be pushed back to the central repository somehow (or could possibly be handled in a repository private to the autobuild machinery, but still, it would be a repository that is pushed to by all the autobuild processes). If anything is pushed to the repository in the meantime, this would require a merge, with a good chance to trigger a manual merge, which is not possible in an automated process. I tried to work around this using a special merge wrapper for .hgtags, but I don’t trust it to work reliable enough to be honest.

    Using snapshots in their own repository, the situation is much better than using hg tag with respect to automatic handling of everything. The autobuild processes only have to push to this small repository that holds the snapshots. There’s still a good chance that things need to be merged, but it is almost guaranteed that this can be done without manual intervention. However, still one thing leaves a bad feeling for me in this situation: 1. instead of pushing, I’d actually have to either pull the changeset that adds the snapshot from another repository into the master repo and merge, or add the snapshot directly to the master repository. Either way, I’d have to perform multiple commands on the master repository. Now imagine what happens when multiple autobuild processes try to do this concurrently. Yes, I could implement some kind of locking. Yes, I could implement a kind of service, that does all this from only one process, but both of these solutions add more complexity to the whole process, but I’d rather prefer to decrease complexity. And this is exactly what the tagging-by-cloning approach does for me.

  2. Glen W says:

    Thanks for the nice fix, this is the same issue we faced in our project.

    I found your site and read a few of your other posts. Keep up the good work. I just added your RSS feed to my Google News Reader. Looking forward to reading more from you down the road!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: