Lisp, sql, git and mercurial

This has been a rather unpleasant month (don’t ask, I won’t tell) but right now I’ll look forward toward its end because of two reasons: for one, I’ll be in Hamburg for the European Lisp Symposium for the next two days; the program for the ELS has also been published in between. I’m really looking forward to an interesting set of talks. For another, some patches to CL-SQL which add support for autoincrement behaviour for Postgresql, are probably going to be released soon. To clarify, “autoincrement” is a column constraint in MySQL (among others) that automatically increments the value of the column when a new row is inserted when no value for the autoincrement column is given (cf. MySQL docs on AUTOINCREMENT), a behaviour that Postgresql supports with the serial constraint (cf. this wikibook on converting between MySQL and Postgres). Actually, that has been my first substantial amount of Common Lisp programming in the last two years, which has been triggered by an upgrade of my Debian system. This upgrade implied that an old application of mine would now use CL-SQL version 5.0 which in turn broke the app: I had simply specified a db-type of “serial” previously, but the new CL-SQL code wouldn’t recognize that it had to fetch the automatically generated value from the DB when inserting a new record. More details on the patches can be found on the CL-SQL mailing list.

The developement of this addition was also the first time I had a real-world setup developing with git. In my own projects I use mercurial, so I was eager to learn a little bit more about the differences. It’s funny that a recent opinonated article “Why I like mercurial better than git” more or less talks only about the one point that I found confusing: branch handling. For more background information, I suggest reading this article “A guide to branching in mercurial”. Basically, in my current projects where I use mercurial, I’m using the “branching with clones” approach Steve is describing there. When working on the patches for CL-SQL, I was working on the existing autoincrement branch but when I was through I wanted to port my patches to the master branch. When using mercurial with the described approach, selecting (pulling or pushing) my patches and only my patches to the master branch is dead easy: you just issue a pull/push command restricted to the “right” changesets. Doing this is even supported by Subversion these days via svn cherry picking. Looking at the docs for git pull, fetch and merge, I wasn’t able to figure out what the corresponding “right” incantation for git might look like, if there is one at all. As I didn’t want to hose my “working copy” (sorry for the SVN term again), I resorted to git format-patch, git am resp., which worked fine. Please note that I’m not suggesting that it’s not possible with another approach, quite to the contrary I would be happy to learn about it. One thing that I found rather useful is git’s stash command which let’s you safely abandon your current work to fall back to the last commited version, in order to be able to work on something that popped up in between (typically a minor unrelated problem you encounter while working on a larger piece of changes). I understand that mercurials patch queues enable a similar functionality, but I haven’t used them sofar. Another thing that I found very useful is git’s very easy way to correct (or in git terminology “amend”) a commit by just issuing “git commit -a”. I also like the idea of the “index” or more exactly that you have to explictly “add” which changes you want to commit. A similar behaviour is possible with SVN “changelist” command, but the mere existance of a changelist is not automatically honoured by SVN’s commit.

Version control and testing

What have version control and testing to do with each other? Well, first of all, both are common virtues in the clean code community. What you’ll find is that both virtues are important on their own ground: version control provides a safety guard in that you can roll back to prior versions if you accidently introduce problems in your code. Testing (automated unit tests) provides a safety guard, too, because you can do regression testing when you work with your code. These are both fine goals but seemingly have little to do with each other.

But in reality they do. For sake of argument, let’s take a step back and assume that you have to work in an environment of several developers where neither of these things exists. What will you likely see? What we all have seen several years ago. Commented out code blocks, redundant and often misleading or outdated comments, timestamps with comments cluttered all over the code. And frightened developers that feared each minor change because of the miriad of subtle side effects it might have, let alone major changes to core components. It’s an environment in which refactorings as welll as extensions are very hard and expensive, which results in frightened overworked developers and frustrated managers.

So, what happens when you introduce only one of those virtues? Say, we introduce version control. Now, every change gets documented, except that documenting every change requires, from the developers point of view, documentation at the wrong point. They can’t see the documented changes and the reason for these changes in the source, they see it only in the version control system — iff they add a change message with every change at all. Much more likely is that you will see commit messages such as “.” or “bug fix”, and the same old mess of timestamps, outdated comments and commented out code as before. Why is that? Because your developers are now not as frightened as they used to be (because they can now rely on the version control system to fall back to older versions), but they still have the same need to understand and document the code. And the commit log is both “too far away” from the code and “out of it’s purpose” for this task: the commit log shouldn’t document what the code is supposed to do, only when something was implemented to behave in a particular way.

This is where a development (unit) test suite comes into the picture: you document every required behaviour in tests. With every change to the code, you also update the test. As a developer, you can now look into your test suite to see what the code is supposed to do. Now developers will likely become much more confident with their changes, because they can run the tests and see what happens (hopefully next to immediately) without requiring time- and resource-consuming manual tests.

But what about documenting the changes to the code? Well, you should simply document any changes in the commit message of your version control system, because it’s now no longer necessary to keep the entire version history in mind to understand what the current code state is supposed to do. You have the tests that tell you what the code should do. The commit log now only serves the purpose of documenting what has changed over time and is no longer required to understand what the code should do. So you don’t have to keep the clutter in your code, resulting in much cleaner source code files.

Summary: Taken together, the whole of version control and testing adds up to more than a simple addition of their own values.

Now I'm all over the shop ... or Converting from RCS to Mercurial

For a long time I hadn’t looked closer at those modern distributed revision control systems like Git, Darcs or Mercurial. This was mainly due to two facts: As I’m currently neither involved in any major open source project which uses these systems nor in a project at work which requires the facilities offered by such systems, and as there was no easy access for them in XEmacs, the more traditional systems like Subversion, CVS and RCS are fine for me. However, there was this nagging feeling that I might miss something and as revision systems always have been somewhat of a pet peeve of mine, I eventually spend some time reading up more on them. I’ve read quite a lot of discussions on the web, and gathered that mercurial might be worth a closer look, as it claims to be quite easy to handle, comparably well documented and quite fast. And then finally I’ve read on xemacs-beta that the new vc package (in Pre-Release) would support mercurial as well.

Well, that’s where I am now: I have several pieces of code lying around which I sometimes develop on my main machine and sometimes on my laptop when moving around. This is the scenario where a server-based approach to revision control is not what you want: you won’t be able to access your server while you’re on the road and hence you can’t commit. Now, with RCS that’s not a problem, as there is no server involved. But of course, since RCS is a file-system local revision system, syncing is a major problem and you have to go to great pains to ensure you don’t overwrite changes you made locally in between syncs. I hope that a distributed version control system like mercurial will solve the problem, as I no longer have to decide which version is the current head version, instead cherry-picking change sets at will.

But of course, for this to happen, I have to convert my RCS repositories to Mercurial. This doesn’t seem to be a common problem: there are a lot of tools for conversion from CVS or Subversion (see Mercurial Wiki, e.g. Tailor for instance), but not from RCS. I ended up following the instructions given in the TWiki Mercurial Contribution page. I have some minor corrections, though, so here we go:

-1. (Step 6 in TWiki docs) Ensure all your files are checked in RCS. I won’t copy the advice from the TWiki page here, because I believe in meaningful commit messages and would urge you to do a manual check. 0. You’ll need cvs20hg and rcsparse which you will find here. You’ll need to have Python development libraries installed, i.e. Python.h. For Debian systems, this is in package python-dev. Installation is as simple as two “./setup install” as root which will install the relevant libraries and Python scripts. 1. Create a new directory for your new mercurial repository (named REPO-HG, replace that name):

    mkdir REPO-HG

2. Initialize the repository:

   hg init REPO-HG

3. (Step 4 in the TWiki document) Create a new copy of your old RCS repository (named REPO here, replace that with the name containing your old RCS files), add a CVSROOT and a config file (mistake one in the TWiki docs: As with all CVS data, the “config” file needs to go to CVSROOT, not to CVSROOT/..). Of course, if you’re no longer interested in your old data, you may omit the initial copy.

    mkdir tmp
    cp -ar REPO tmp/REPO-old
    mkdir tmp/CVSROOT
    touch tmp/CVSROOT/config

4. Inside your directory with the old RCS data, move everything out of the RCS subdirectories (mistake two in the TWiki docs: the double-quotes need to go before the asterix):

   find tmp/REPO-old -type d -name RCS -prune | while read r; do mv -i "$r"/* "$r/.."; rmdir "$r"; done

5. Run cvs20hg to copy your old repository to mercurial. If you don’t follow the directory scheme shown below, you’ll end up with your new mercurial repository missing the initial letter of the name of all top-level files and directories.

   cvs20hg tmp/REPO-old `basename tmp/REPO-old` REPO-HG

6. Check that everything looks like you would expect:

   cd REPO-HG
   hg log

7. If you had files in your old directory not under version control that you’ll like to keep, copy them over. This might be a good time to think about whether they are worth having them under revision control. Afterwards throw away any old directory you no longer need (i.e., your original REPO, tmp/*).

Page 1 of 1, totaling 3 entries