Jul 24

What have version control and testing to do with each other? Well, first of all, both are common virtues in the clean code community. What you’ll find is that both virtues are important on their own ground: version control provides a safety guard in that you can roll back to prior versions if you accidently introduce problems in your code. Testing (automated unit tests) provides a safety guard, too, because you can do regression testing when you work with your code. These are both fine goals but seemingly have little to do with each other.

But in reality they do. For sake of argument, let’s take a step back and assume that you have to work in an environment of several developers where neither of these things exists. What will you likely see? What we all have seen several years ago. Commented out code blocks, redundant and often misleading or outdated comments, timestamps with comments cluttered all over the code. And frightened developers that feared each minor change because of the miriad of subtle side effects it might have, let alone major changes to core components. It’s an environment in which refactorings as welll as extensions are very hard and expensive, which results in frightened overworked developers and frustrated managers.

So, what happens when you introduce only one of those virtues? Say, we introduce version control. Now, every change gets documented, except that documenting every change requires, from the developers point of view, documentation at the wrong point. They can’t see the documented changes and the reason for these changes in the source, they see it only in the version control system — iff they add a change message with every change at all. Much more likely is that you will see commit messages such as “.” or “bug fix”, and the same old mess of timestamps, outdated comments and commented out code as before. Why is that? Because your developers are now not as frightened as they used to be (because they can now rely on the version control system to fall back to older versions), but they still have the same need to understand and document the code. And the commit log is both “too far away” from the code and “out of it’s purpose” for this task: the commit log shouldn’t document what the code is supposed to do, only when something was implemented to behave in a particular way.

This is where a development (unit) test suite comes into the picture: you document every required behaviour in tests. With every change to the code, you also update the test. As a developer, you can now look into your test suite to see what the code is supposed to do. Now developers will likely become much more confident with their changes, because they can run the tests and see what happens (hopefully next to immediately) without requiring time- and resource-consuming manual tests.

But what about documenting the changes to the code? Well, you should simply document any changes in the commit message of your version control system, because it’s now no longer necessary to keep the entire version history in mind to understand what the current code state is supposed to do. You have the tests that tell you what the code should do. The commit log now only serves the purpose of documenting what has changed over time and is no longer required to understand what the code should do. So you don’t have to keep the clutter in your code, resulting in much cleaner source code files.

Summary: Taken together, the whole of version control and testing adds up to more than a simple addition of their own values.

Posted by Holger Schauer

Defined tags for this entry: ,
Jun 9

Betrand Mathieus blog article on cleaning up trailing whitespace comes in handy for me as I’ve frequently run into problems with the Python test coverage tool stumbling over trailing whitespace. It also reminded me of an emacs snippet I recently installed to detect a mix of space and tabs in my Python buffers — I do set (setq indent-tabs-mode nil) in my .emacs for python-mode, however, I still occasionally somehow manage to insert some tabs in my source buffers. So I came up with the following snippet which validates that a buffer in python-mode doesn’t contain any tabs. It’s hooked up with the very general write-file-hook, but there is no python-specific hook on saving buffers. In case the buffer does contain any tab, it will leave the point (the place where your cursor will be) at the fount tab.


; code snippet GUID 32F94179-A86B-4780-8645-8A526BD8533A
(defun no-tab-validation (buffer)
  (interactive "bValidate buffer:")
  (save-excursion
    (unless (equal (buffer-name (current-buffer)) buffer)
      (switch-to-buffer buffer))
    (if (re-search-forward "\t" nil t)
    (error "Buffer %s contains tabs" buffer))))

(defun py-tab-validate-on-save-hook ()
  (when (equal mode-name "Python")
    (no-tab-validation (buffer-name (current-buffer)))))

(add-hook 'write-file-hook 'py-tab-validate-on-save-hook)

;;; inhibit tabs in some modes
(defun set-indent-tabs-mode (value)
  "Set indent-tabs-mode to value."
  (setq indent-tabs-mode value))

(defun toggle-tabs ()
  "Toggle `indent-tabs-mode'."
  (interactive)
  (set-indent-tabs-mode
   (not indent-tabs-mode)))

(defun disable-tabs-mode ()
  (set-indent-tabs-mode nil))

(add-hook 'sgml-mode-hook 'disable-tabs-mode)
(add-hook 'xml-mode-hook 'disable-tabs-mode)
(add-hook 'python-mode 'disable-tabs-mode)

Posted by Holger Schauer

Defined tags for this entry: ,
Apr 16

One of the nicer extensions of mozilla I use is It’s all text. It allows you to call an external editor to edit text fields which is a really nice thing if you’re editing longer wiki pages, for instance.

Not surprisingly, my external editor of choice is XEmacs. But my XEmacs is heavily configured and loads a lot of stuff. Even worse, I usually run a beta version with debugging turned on, so that it runs extra slow, so the time gap between clicking on that little “edit” button below the text field and the moment when I could finally start typing started to annoy me quickly.

I remembered that some years ago I used to log in remotely to a running machine, fire up some script and have either my running XEmacs session connected or a new XEmacs process started. Connecting to the running XEmacs happens via gnuclient, so much was clear but gnuclient doesn’t start a new XEmacs if there is none running (actually, the server process gnuserv will die when the XEmacs process terminates, so there isn’t much gnuclient can do). The emacs wiki page linked to above already contains a number of scripts that eventually do what I need to do, but I’ve found none of them convincing, so here is my version. It’s linux specific in its BSD style call to ‘ps’ to determine the running processes, but should be portable sh otherwise. I could have sprinkled the fetch_procs function with OS specific variants, but as I’m currently running my XEmacs on linux only, I left that as an exercise to the reader.

[geshi lang=bash]

!/bin/sh

emacs=xemacs gnuclient=gnuclient gnuserv=gnuserv user=id -un

gnuclientopts=”“

fetch_procs () { ps xU $user }

runninggnuserv=fetch_procs | grep "[^]]$gnuserv" runningemacs=fetch_procs | grep "[^]]$emacs" if [ “$?” -eq “0” ]; then gnuservpid=echo $runninggnuserv | cut -f1 -d'' emacspid=echo $runningemacs | cut -f1 -d'' echo “Using running gnuserv $gnuservpid of emacs $emacspid” $gnuclient $gnuclientopts “$@” else echo “Starting new $emacs” $emacs “$@” fi [/geshi]

ObTitle: Panic at the disco, from the album “A fever you can’t sweat out”

Posted by Holger Schauer

Defined tags for this entry: ,
Feb 6

I’ve become quite addicted to writing tests during my development tasks. I’ve had wanted to dig into test-driven development for quite some time, but it was the seamless integration of Test::Unit, Ruby’s unit testing module, in Eclipse that got me going initially. I then did some unit testing with Common Lisp packages and am currently heavily using pyunit and python doctests (mostly in the context of zope testing). Writing tests has become my second development nature: It gives you that warm fuzzy feeling that you have that little safety net while modifying code.

However, there are times when terminology comes along and gives you a headache. A terminology I’ve learned about during the last year is the difference between unit testing, integration tests and functional tests (for an overview see wikipedia on software testing). But as you can see for instance in this article on integration tests in Rails, it’s not always easy to agree on what means what — Jamis and/or the Rails community seem to have the integration/functional distinction entirely backwards from what, for instance, the Zope community (on testing) thinks.

Now, one might argue that terminology doesn’t matter much given that you do write tests at all, but it’s not so easy. For instance, if your “unit test” of a given class requires another class, is that still unit testing or is it integration testing? Does it even make sense to talk about unit-testing a class? A class on its own isn’t that interesting after all, it’s its integration and interoperation with collaborateurs were the semantics of a class and its methods become interesting. Hence, shouldn’t you rather test a specific behaviour, which probably involves those other classes? And what now, if your code only makes sense when run on top of a specific framework (Zope, Rails, you name it)? Michael Feathers argues convincingly in his set of unit testing rules that any such tests are probably something else.

Ultimately these questions directly pertain to two aspects: code granularity and code dependencies — and remember, test code is code after all. These are directly related, of course: if your code is very fine-grained, it’s much more likely that it will also be much more entangled (although the dependency might be abstracted with the help of interfaces or some such, you still have the dependency as such). And as a consequence, your test code will have to mimick these dependencies. On the contrary, if your code blocks are more coarse-grained (i.e. cover a greater aspect of funcionality), you might have less (inter-)dependencies, but you won’t be able to test functionality on a more fine-grained level. As Martin Fowlers excellent article Mocks aren’t stubs discusses in detail, one way to loosen these connections between code and tests is to use mock objects or stubs. Fowlers article also made clear to me that I’ve used the term “mock object” wrongly in my post on mock objects in Common Lisp: dynamically injecting an object/function/method (as a replacement for a collaborator required for the “code under test”) that returns an expected value means using a stub, not a mock — another sign of not clearly enough defined terminology (btw, the terminology Fowler is using is that of G. Mercezaos xunit patterns book).

It’s worth keeping these things apart because of their different impact on test behaviour: mocks will force you to think about behaviour whereas stubs focus on ‘results’ of code calls (or object state if you think in terms of objects being substituted). As a result, when you change the behaviour of the code under test (say you’re changing code paths in order to optimize code blocks) this might (mocks) or might not (stubs) result in changes to the test code.

It’s also worth thinking about mocks and stubs because they also shed a new light on the question of test granularity: when you’re substituting real objects in either way, you’re on your way to much more fine-grained tests, which implies that you loosen the dependency of your tests: You can now modify the code of your collaborateur class without the test for your code under test breaking. Which brings us back full circle to the distinction between unit tests and integration tests: you now might have perfect unit tests, but now you’re forced to additionally tests the integration of all the bits and pieces. Otherwise you might have all unit tests succeed but your integrated code still fails. Given this relationship, it seems immediately clear that 100% test coverage might not be the most important issue with unit tests: you might have 100% unit test success, but 100% integration failure at the same time — if you don’t do continuous integration and integration tests, of course. Now what’s interesting is that it might be possible to check test coverage on code paths, but it might not be easy to check integration coverage. I would be interested to learn about tools detailing such information.

Recently I had another aha moment with regard to testing terminology: Kevlin Henney’s presentation at this years german conference on object oriented programming, the OOP 2009, on know your units: TDD, DDT, POUTing and GUTs: tdd is test driven development, of course. The other ones might be not so obvious: “guts” are just good unit tests and “pout” is “plain old unit testing”. I saw myself doing tdd, but come to think of it, I’m mostly applying a combination of tdd, pout (after the fact testing) and ddt: defect driven testing. I find the introduction of a term for testing after the code has been written interesting because it provides a way to talk about how to introduce testing in the first place. Especially defect driven testing, the idea to write a test to pinpoint and overcome an erroneous code path, might be a very powerful way to introduce the habit of regularly writing (some) tests for an existing large code base. So you avoid the pitfall of never being able to test “all this lots of code because there is never the time for it” and you might also motivate people to try writing test before code. And on this level, it might at first not be that relevant to make the distinction between integration and unit tests to clear: start out with whatever is useful.

Posted by Holger Schauer

Defined tags for this entry: ,
Jan 13

Today I stumbled upon a list of top ten semantic web products of 2008 (according to ReadWriteWeb), which is interesting alone because of the varying range of the products listed. There is for instance a blogging helping browser extension (Zemanta) right next to an API providing semantic analysis of texts (OpenCalais). Another interesting point to me is that there seems to be somewhat contradictory directions: while some, e.g. Yahoo with it’s SearchMonkey, seem to go into the direction of open access, others like Hakia are strictly closed-shop approaches. It will be very interesting to see where future development will be headed, given that knowledge is not necessarily a unique thing — as can be easily seen in the many different ontologies that have been built or in the edit wars going on in Wikipedia. Probably these differences might go hand in hand with company size or profit — the odd start up is as well on the list as large corporations. I don’t like to sound like Cassandra, but given the recent financial crisis, I sincerly hope that the precious and small commercial flowers won’t starve in another AI winter 2.0.

But perhaps there is a chance that open source approaches bring some movement — today I also happened to learn about Nepomuk, a “Networked Environment for Personalized, Ontology-based Management of Unified Knowledge” — that happens to be integrated in the latest KDE version 4.0. Although, from what I hear from my KDE using friends, none of them seems to love that particular integration, but I can’t judge whether that has more to do with the backend or the integration.

Posted by Holger Schauer

Defined tags for this entry: ,
Dec 10

I’ve become a python programmer, too, lately, due to a job change. Python is a fine language so far, although to me it’s mostly just like Ruby, though with even less functional flavour. However, just as with Ruby, I’m really missing slime, the superior lisp interaction mode for Emacs, when hacking python code. I could now start to write down a list of things I’m missing (which I’ve intended to do), however, Andy Wingo spares me the hassle, as he has just written an excellent article on slime from a python programmers view.

However, I would like to elaborate a little on the main difference for me: the client/server socket approach of slime. Let me briefly recapulate what this implies: slime consists of two parts, a client written in Emacs lisp and a server written in Common Lisp (AFAIK there is at least also an implementation for clojure, maybe also one for some scheme implementation). In order to use slime in it’s full glory, it’s hence required that you have a common lisp process running which in turn runs the slime server part. If you now fire up slime, you’ll get an interaction buffer over which you can access the REPL of the lisp process, which in python would be the interpreter prompt. You can then interact with the lisp process, evaluating pieces of code from your lisp source code buffer directly in the connected lisp process. What is incredibly useful for me is that you can not only start a new lisp process but also connect to an already running lisp process, given that it has the slime server started (this is obviously mainly useful if the lisp implementation you use has multi-threading capabilities). I use it to connect to a running web server application, which I can then inspect, debug and modify. Modification includes redefinition of functions, macros and classes, which of course is also a particular highlight of Common Lisp. I would like to cite a comment of the reddit user “fionbio” he made wrt. to the linked article: In fact, Python language wasn’t designed with lisp-style interactive development in mind. In CL, you can redefine (nearly) anything you want in a running program, and it just does the right thing. In Python, there are some problems, e.g. instances aren’t updated if you modify their class. Lisp programmers often, though not always, refer to various things (functions, classes, macros, etc.) using symbols, while Python programs usually operate with direct references, so when you update parts of your program you have much higher chances that there will be a lot of references to obsolete stuff around.

To complement Bill clementsons excellent article series on slime a little, I’m going to describe how I’m using/configuring python-mode to make it match my expectations a little closer. Essentially I would like to access my python process just as I would with slime/Common Lisp, but that’s not possible. The reason, btw., is nearly unchanged: I need to code on a web server app (written in Zope) which may not even run on the same machine I’m developing on. Let’s first cover the simple stuff: To enable a reasonable command interface to the python interpreter, I require the ipython emacs library. If the python interpreter runs locally, I also use py-complete, so that I can complete my code at least a little. Unfortunately, this breaks when the python interpreter doesn’t run locally, because the py-complete needs to setup some things in the running python process, which it does by writing to a local temp file and feeding it to the python process. Unfortunately, the code in py-complete lacks customizability, i.e., you can’t specify where that temp file should be located — I should be able to come up with a small patch in the near future, which I will add below. Finally, I also require doctest-mode as a support for writing doctests, but that’s not really relevant.

Now, on to the more involved stuff: I introduce some new variables and a new function py-set-remote-python-environment, which uses the those variables to do a remote call (via ssh) to python. This at least allows me to do things like setting py-remote-python-command to “/home/schauer/zope/foo-project/bin/instance” and py-remote-python-command-args to “debug”, so that I can access a remote debug shell of my current zope product. That alone will only allow me to fire up and access the remote python, so I could now develop the code locally, having it executed remote. More typical though is that you would also want to keep the code on the remote machine, too: for this I use tramp, a package for remotely accessing files/directories from within emacs. In combination, this allows me to edit and execute the code on the remote machine. It is still nowhere near what is possible with slime, but at least it allows me to persue my habit of incremental and interactive development from within my usual emacs installation (i.e., it doesn’t require me to deal with any Emacs related hassle on the remote machine).


;;; python-stuff.el --- python specific configuration

(when (locate-library "ipython")
  (require 'ipython))

(when (locate-library "doctest-mode")
  (require 'doctest-mode))

(defvar py-remote-connect-command "ssh""*Command for connecting to a remote python, typically \"ssh\"")
(defvar py-remote-connection-args '("user@remotemachine")"*List of strings of connection options.")
(defvar py-remote-python-command "python""*Command to execute for python")
(defvar py-remote-python-command-args '("-i")"*List of strings arguments to be passed to `py-remote-python-command`.")
(defvar py-remote-python-used nil"Remember if remote python is used.")

(defun py-set-remote-python-environment ()
  (interactive)
  (let ((command-args (append py-remote-connection-args 
                  (list py-remote-python-command)
                  py-remote-python-command-args)))
    (setq py-python-command py-remote-connect-command)
    (setq py-python-command-args command-args))
  (setq py-remote-python-used t))

(when (locate-library "py-complete")
  (autoload 'py-complete-init "py-complete")
  (defun my-py-complete-init ()"Init py-complete only if we're not using remote python"
    (if (not py-remote-python-used)
        (py-complete-init)))
  (add-hook 'python-mode-hook 'my-py-complete-init))

Posted by Holger Schauer

Defined tags for this entry: ,
Nov 11

One of the bigger practical problems with unit testing is isolating the test coverage. Say, you want to test a piece of code from the middle (business) layer. Let’s assume further the piece of code under consideration makes some calls to lower level code to retrieve some data. The problem of test coverage isolation is now that if you “simply” call your function, you are implicitly also testing the lower level code, which you shouldn’t: if that lower level code gets modified in an incorrect way, you would suddenly see your middle level code fail although there was no change made to it. Let’s explore ways to avoid the problems in Common Lisp.

There is a very good reason why you would also want to have such test dependencies to ensure your middle level code still works if the lower level code is extended or modified. But that is no longer unit testing: you are then doing so-called integration tests which are related, but still different beasts.

Now, I was facing exactly the typical dreaded situation: I extended an application right above the database access layer which had not seen much tests yet. And of course, I didn’t want to go the long way (which I will eventually have to go anyway) and set up a test database with test data, write setup and tear-down code for the db etc. The typical suggestion (for the xUnit crowd) is to use mock objects which brings us finally on topic. I was wondering if there are any frameworks for testing with mock objects in Lisp, but a quick search didn’t turn up any results (please correct me if I’ve missed something). After giving the issue a little thought, it seemed quite clear why there aren’t any: probably because it’s easy enough to use home-grown solutions such as mine. I’ll use xlunit as the test framework, but that’s not relevant. Let’s look at some sample code we’ll want to test:

[geshi lang=lisp] (defun compare-data (data &connection) (let ((dbdata (retrieve-data-by-id (id data)))) (when (equal (some-value data) (some-db-specific-value dbdata)) t))) [/geshi] The issue is with retrieve-data-by-id which is our interface to the lower level database access.
And note that we’ll use some special functions on the results, too, even if they may just be accessors.
Let’s assume the following test code: [geshi lang=lisp] (use-package :xlunit)

(defclass comp-data-tc (test-case) ((testdata :accessor testdata :initform (make-test-data))))

(def-test-method comp-data-test ((tc comp-data-tc)) (let ((result (compare-data (testdata tc)))) (assert-equal result t))) [/geshi]

Now the trouble is: given the code as it is now, the only way to succeed the test is to make sure that make-test-data returns an object whose values match values in the database you’re going to use when compare-data get’s called. You’re ultimately tying your test code (especially the result of make-test-data) to a particular state of a particular database, which is clearly unfortunate. To overcome that problem, we’ll use mock objects and mock functions. Let’s define a mock-object mock-data and a mock-retrieve-data function, which will simply return a single default mock object.

[geshi lang=lisp] (defclass mock-data () ((id :accessor id :initarg :id :initform 0) (val :accessor some-db-specific-value :initarg :val :initform “foo-0”))))

(defun mock-retrieve-data (testcase) (format t “Establish mock for retrieve-data”) (lambda (id) (format t “mock retrieve-data id:~A~%”) (find-if #’(lambda (elem (when (equal (id elem) id) elem)) (testdbdata testcase)))) [/geshi]

Why that mock-retrieve-data returns a closure will become clear in a second, after we’ve answered the question how these entirely different named object and function can be of any help. The answer lies in CLs facility to assign different values (or better said) definitions to variables (or better said to function slots of symbols). What we’ll do is to simply assign the function definition we’ve just created as the function to use when retrieve-data is going to be called. This happens in the setup code of the test case:

[geshi lang=lisp] (defclass comp-data-tc (test-case) ((testdata :accessor testdata :initform (make-test-data)) (testdbdata :accessor testdbdata) (func :accessor old-retrieve-func)))

(defmethod set-up ((tc comp-data-tc)) ; set up some test data (dotimes (number 9) (setf (testdbdata tc) (append (list (make-instance ‘mock-data :id number :value (concatenate ‘string “value-” number))) (testdbdata tc)))) ; establish our mock function (when (fboundp ‘retrieve-data) (setf (old-retrieve-func tc) (fdefinition ‘retrieve-data)))) (setf (fdefinition ‘retrieve-data) (mock-retrieve-data tc)))

(defmethod tear-down ((tc comp-data-tc)) ; After the test has run, re-establish the old definition (when (old-retrieve-func tc) (setf (fdefinition ‘retrieve-data) (old-retrieve-func tc)))) [/geshi]

You can now see why mock-retrieve-data returns a closure: by this way, we can hand the data we establish for the test case down to the mock function without resorting to global variables.

Now, the accessor fdefinition comes in extremely handy here: we use it to assign a different function definition to the symbol retrieve-data which will then be called during the unit-test of compare-data.

..Establish mock for retrieve-data
mock retrieve-data id: 0
F
Time: 0.013

There was 1 failure: ...

There is also symbol-function which could be applied similarly and which might be used to tackle macros and special operators. However, the nice picture isn’t as complete as one would like it: methods aren’t covered, for instance. And it probably also won’t work if the function to mock is used inside a macro. There are probably many more edge cases not covered by the simple approach outlined above. Perhaps lispers smarter than me have found easy solutions for these, too, in which case I would like to learn more about them.

Posted by Holger Schauer

Defined tags for this entry: ,
Feb 22

This post is mainly a reference post about a particular topic whose solution wasn’t immediately obvious to me from the docs to CL-SQL. Using CL-SQL with (enable-sql-reader-syntax), I had written a routine that looks basically likes this:

[geshi lang=lisp] (defun data-by-some-criteria (criteria &key (dbspec +db-spec+) (dbtype +db-type+)) (with-database (db dbspec :database-type dbtype :if-exists :old) (let (dbresult) (if criteria (setq dbresult (select ‘some-model ‘other-model :where [and [= [some.criteria] criteria] [= [some.foreignid] [other.id]]] :order-by ‘([other.name] [some.foreignid] [year] [some.name]) :database db)) (setq dbresult (select ‘some-model ‘other-model :where [and [null [some.criteria]] [= [some.foreignid] [other.id]]] :order-by ‘([other.name] [some.foreignid] [year] [some.name]) :database db)) (when dbresult (loop for (some other) in dbresult collect some))))) [/geshi]

This is ugly because the only difference between those two select statements is the check for the criteria, but I had no idea how to combine the two select statements into one, because it’s not possible to embed lisp code (apart from symbols) into an sql-expression (i.e. the type of arguments for :where or :order etc.). With the next requirement things would become far worse: The order-by statement needs to get more flexible so that it is possible to sort results by year first. Given the approach shown above this would result in at least four select statements, which is horrible. So, naturally I wanted a single select statement with programmatically obtained :where and :order-by sql expressions.

Step 1: It occured to me that it should be possible to have the arguments in a variable and simply refer to the variable. E.g., using a more simple example: [geshi lang=lisp] (let (where-arg) (if (exact-comp-needed) (setq where-arg ‘[= [column] someval]) (setq where-arg ‘[like [column] someval])) (select ‘model :where where-arg)) [/geshi]

So I could now have my two different where-args and two different order-args and use a single select statement. Main problem solved.

Step 2: But for the :where arg in my original problem, only a small fraction of the sql-expression differs. So how do I avoid hard coding the entire value of where-arg? How can I combine some variable part of an sql-expression with some fixed parts? I.e, ultimately I want something like:

[geshi lang=lisp] (let (comp-op where-arg) (if (exact-comp-needed) (setq comp-op ‘=) (setq comp-op ‘like)) (setq where-arg ‘[ <put comp-op here> [column1] someval]) (select ‘model :where where-arg)) [/geshi]

But with CL-SQL modifying the reader, there seems to be no way to make <put comp-op here> work. I didn’t knew how to get the usual variable evaluation into the sql-expression, or how to escape from CL-SQL’s sql-reader-syntax to normal lisp evaluation.

Somewhere in the back of my head where was that itch that CL-SQL might offer some low-level access to sql expressions. And indeed it does. There are two useful functions, sql-expression and sql-operation. sql-operation “returns an SQL expression constructed from the supplied SQL operator or function operator and its arguments args” (from the cl-sql docs), and we can supply the operator and its arguments from lisp — which is exactly what I want.

Now, the nice thing is that it’s easy to mix partly handcrafted sql expressions with CL-SQL special sql syntax constructs that will be automatically handled by the reader (if you enable it only via enable-sql-reader-syntax, of course). I.e., for <put comp-op here> we can use sql-operation, but the rest stays essentially the same:

[geshi lang=lisp] (let (where-arg) (if (exact-comp-needed) (setq where-arg (sql-operation ‘= [column1] someval)) (setq where-arg (sql-operation ‘like [column1] someval))) (select ‘model ‘other-model :where where-arg)) [/geshi]

Now, coming back to my original problem, based on this approach I can split out the common part of the :where and :order arguments and combine those with the varying parts as needed and hand them down to a single select statement. Problem solved.

Posted by Holger Schauer

Defined tags for this entry:
Nov 9

Some time ago, I was looking at splitting text with Elisp, Perl, Ruby and Common Lisp. Yesterday, when I again had to do quite the same thing, it occurred to me that the Common Lisp solution was unnecessary complex/long. I’m not a Perl guru, but I believe the following is probably hard to beat even with Perl:


CL-USER> (format t "~{<li>~A</li>~%~}" (cl-ppcre:split "\|""Kim Wilde|Transvision Vamp|Ideal|Siouxsie and the Banshees|Nena|Iggy Pop"))
<li>Kim Wilde</li><li>Transvision Vamp</li><li>Ideal</li><li>Siouxsie and the Banshees</li><li>Nena</li><li>Iggy Pop</li>
NIL

For the uninitiated, it’s not the cl-ppcre library which is interesting here but the built-in iteration facilities of format. See the Hyperspec on the control-flow features of format for details. Now, I usually tend to avoid the mini languages that come with Common Lisp like the one of format or loop when writing real programs, but when using Lisp as a glorified shell they come in very handy.

Posted by Holger Schauer

Defined tags for this entry:
Oct 17

For a long time I hadn’t looked closer at those modern distributed revision control systems like Git, Darcs or Mercurial. This was mainly due to two facts: As I’m currently neither involved in any major open source project which uses these systems nor in a project at work which requires the facilities offered by such systems, and as there was no easy access for them in XEmacs, the more traditional systems like Subversion, CVS and RCS are fine for me. However, there was this nagging feeling that I might miss something and as revision systems always have been somewhat of a pet peeve of mine, I eventually spend some time reading up more on them. I’ve read quite a lot of discussions on the web, and gathered that mercurial might be worth a closer look, as it claims to be quite easy to handle, comparably well documented and quite fast. And then finally I’ve read on xemacs-beta that the new vc package (in Pre-Release) would support mercurial as well.

Well, that’s where I am now: I have several pieces of code lying around which I sometimes develop on my main machine and sometimes on my laptop when moving around. This is the scenario where a server-based approach to revision control is not what you want: you won’t be able to access your server while you’re on the road and hence you can’t commit. Now, with RCS that’s not a problem, as there is no server involved. But of course, since RCS is a file-system local revision system, syncing is a major problem and you have to go to great pains to ensure you don’t overwrite changes you made locally in between syncs. I hope that a distributed version control system like mercurial will solve the problem, as I no longer have to decide which version is the current head version, instead cherry-picking change sets at will.

But of course, for this to happen, I have to convert my RCS repositories to Mercurial. This doesn’t seem to be a common problem: there are a lot of tools for conversion from CVS or Subversion (see Mercurial Wiki, e.g. Tailor for instance), but not from RCS. I ended up following the instructions given in the TWiki Mercurial Contribution page. I have some minor corrections, though, so here we go:

-1. (Step 6 in TWiki docs) Ensure all your files are checked in RCS. I won’t copy the advice from the TWiki page here, because I believe in meaningful commit messages and would urge you to do a manual check. 0. You’ll need cvs20hg and rcsparse which you will find here. You’ll need to have Python development libraries installed, i.e. Python.h. For Debian systems, this is in package python-dev. Installation is as simple as two “./setup install” as root which will install the relevant libraries and Python scripts. 1. Create a new directory for your new mercurial repository (named REPO-HG, replace that name):

    mkdir REPO-HG
2. Initialize the repository:
   hg init REPO-HG
3. (Step 4 in the TWiki document) Create a new copy of your old RCS repository (named REPO here, replace that with the name containing your old RCS files), add a CVSROOT and a config file (mistake one in the TWiki docs: As with all CVS data, the “config” file needs to go to CVSROOT, not to CVSROOT/..). Of course, if you’re no longer interested in your old data, you may omit the initial copy.
    mkdir tmp
    cp -ar REPO tmp/REPO-old
    mkdir tmp/CVSROOT
    touch tmp/CVSROOT/config
4. Inside your directory with the old RCS data, move everything out of the RCS subdirectories (mistake two in the TWiki docs: the double-quotes need to go before the asterix):
   find tmp/REPO-old -type d -name RCS -prune | while read r; do mv -i "$r"/* "$r/.."; rmdir "$r"; done
5. Run cvs20hg to copy your old repository to mercurial. If you don’t follow the directory scheme shown below, you’ll end up with your new mercurial repository missing the initial letter of the name of all top-level files and directories.
   cvs20hg tmp/REPO-old basename tmp/REPO-old REPO-HG
6. Check that everything looks like you would expect:
   cd REPO-HG
   hg log
7. If you had files in your old directory not under version control that you’ll like to keep, copy them over. This might be a good time to think about whether they are worth having them under revision control. Afterwards throw away any old directory you no longer need (i.e., your original REPO, tmp/*).

Posted by Holger Schauer

Defined tags for this entry: ,
Aug 8

For a review, I needed to get the track list of a given CD. As the track list wasn’t available via CDDB, I went to some large online store and found the tracklist. I need to convert it to XML, though. The original data I fetched looks like so:

1. Fox In A Box
2. Loaded Heart
3. All Grown Up
4. Pleasure Unit
...

whereas I need:

<li id=”1”>Fox In A Box</li> <li id=”2”>Loaded Heart</li> <li id=”3”>All Grown Up</li> …

After cutting the original data to my Emacs, writing out a simple file and using Perl for that simple transformation seemed just gross. In the past, I’ve been an Emacs hacker. But no more, or so it seems, since it took me nearly half an hour just to come up with this simple function:

(defun tracklist-to-li (point mark)"Generate a string with <li>-elements containing tracks.
Assumes that one every line of region, a track position 
and the track name is given."
  (interactive "r")
  (save-excursion 
    (goto-char point)
    (let ((current-pos (point))
      (result ""))
      (while (re-search-forward "^\([0123456789]+?\).[ \t]+\(.*\)$"
                mark t)
    (setq result
          (concat result"<li id=\""
              (match-string 1)"\">"
              (match-string 2)"</li>\n"))
    (setq current-pos (point)))
      (message result))))

What took the most time was that I’ve had forgotten to escape the grouping parenthesis in the regular expression and that it took me a little while to accept that there is really no \d or equivalent character class in Emacs regexps. Which probably means that I’ve been doing too much in Perl, sed and the like. OTOH, it just may hint at the horror of regular expressions handling in Emacs. What I also dislike is that whenever you want some result in Emacs and see it, too, you have to invoke an interactive operation like message. Of course, there is IELM, but this doesn’t really help you for interactive functions operating on regions.

And five minutes later, I realize I need to convert some string like “The (International) Noise Conspiracy|The Hi-Fives|Elastica” into a similar list structure. With a simple cut & paste and roughly 30 seconds later, I have

[bauhaus->~]perl -e '$a="The (International) Noise Conspiracy|The Hi-Fives|Elastica"; @a=split("|",$a); foreach $b  (sort @a) { print "<li>$b</li>\n"; }'
<li>"The (International) Noise Conspiracy"</li><li>"The Hi-Fives"</li><li>"Elastica"</li>

Hmm. Perhaps I’ve come quite a long way on the dark side already … On the other hand, in Ruby, this is just as simple (I’m using irb, the interactive ruby shell here):

irb(main):008:0> a="The (International) Noise Conspiracy|The Hi-Fives|Elastica"
=>"The (International) Noise Conspiracy|The Hi-Fives|Elastica"
irb(main):009:0> a.split("|").each {|string|
irb(main):010:1* print "<li>"
irb(main):011:1> print string
irb(main):012:1> print "</li>\n"
irb(main):013:1> }<li>The (International) Noise Conspiracy</li><li>The Hi-Fives</li><li>Elastica</li>
=> ["The (International) Noise Conspiracy", "The Hi-Fives", "Elastica"]

The difference here is the implicit array Ruby generates, which of course in Perl you could hide in the array position of the foreach loop. Note the annyoing misfeature of irb to always show the prompt even when your still continuing your current input line.

In Common Lisp we can do it just as short:

CL-USER> (let* ((a "The (International) Noise Conspiracy|The Hi-Fives|Elastica")
                  (splits (ppcre:split "\|" a)))
               (loop for string in splits
                  do 
                      (format t "<li>~S</li>~%" string)))
<li>"The (International) Noise Conspiracy"</li><li>"The Hi-Fives"</li><li>"Elastica"</li>
NIL

The same thing here: The result of the split could have been easily embedded in the loop.

The lesson, of course, is that in the end this example only serves to show that things that are easy to achieve in a high-level are indeed easy to achieve. Or to put it otherwise that the use of regular expressions is no more a discriminating feature between programming languages.

Posted by Holger Schauer

Defined tags for this entry:
Jul 4
Recently, there was a discussion about "the rise of functional languages" over on <a href="http://lwn.net/">Linux weekly news</a>, in which one of the participant claimed that one of the major reasons why nobody uses functional languages in industrial settings would be the lack of explicit resource handling (where a resource is some supposedly "alien" object in the system, say a database handle or something like that).  What he was referring to was the inability to run code on allocating/deallocating a piece of resource. Of course, some people pointed him to various solutions, in particular I recurred to the usual WITH-* style-macros in which one would nest the access to the data while at the same time hiding all what one would do on allocation/de-allocation. His reply went something along the lines that such objects may need to be long-lived (thus a WITH-macro is inappropriate) and that the only resort would be the garbage collector and that there simply is no way of running code at a guaranteed (de-allocation) time. I have to admit that I have no idea how I could code around that problem in Common Lisp (garbage collection even isn't a defined term in the ANSI specification of CL, and I'm very sure I haven't seen any mention of allocation/deallocation in it).

Now, some months later, there is a discussion in comp.lang.lisp on the topic of “portable finalizers” and Rainer Joswig pointed to this chapter in the Lisp machine manual which talks about explicit resource handling the lisp machine. From the excerpt, I can’t judge whether resources are first-class CLOS objects and hence the functions to handle them are generic functions, but if so that would actually allow running code on deallocating a resource, of course with the price of having to handle allocation/deallocation manually. I really wonder if any of todays CL implementations offers the same or at least similar functionality?

Posted by Holger Schauer

Defined tags for this entry:

(Page 3 of 3, totaling 42 entries)