Jul 16

Kata4 is concerned with data munging, basically reading some file, pulling out some data and comparing the data in order to determine some result. Actually, it is even simpler than that as in both cases we are asked to determine the minimum of the data set. Given a weather task and a soccer task, we are asked to fuse the resulting solutions and extract the commonalitites and minimize both solutions, basically allowing for code-reuse as much as possible. Unfortunately, I had not read the opening sentences of the kata closely, instead I read over the entire description. Hence I missed the call to solve each part separately and not to read ahead and directly started with this idea of code re-use in mind. However, as we will see, I didn’t really notice one aspect of code reuse until I solved the second task (soccer).

The first thing I did was take a closer look at the data file. Probably as much as everybody else who had some exposure to Perl, I was initially tempted to approach the data extraction part with regular expressions. But as JWZ said, “some people when faced with a problem say “I know, I’ll use regular expressions. Now they have two problems”, and indeed this is the case here. This is not to say that the particular problem, fetching the date, minimum and maximum temperature from the provided data file would not be solvable with regular expressions, but more that the data file is really more of a fixed width nature. Unfortunately, there are some irregular elements, e.g. the missing values in the WxType row or the post-fix ‘*’ added to exactly one value in both MnT and MxT rows, likely to mark the absolute minimum and maximum temperature of the month, respectively, which causes the data in the corresponding “cell” to hang-over into the white-space pre-fix area of the next row. The last column is also of interest, as it starts out with “mo” in contrast to the numeric values for the days of the month. It also has rational values for some rows (e.g. for the temperature rows) where all other values have integer values. This looks like a computed result line, showing the averages of the respective values for the entire month and should hence not be part of the computation. And finally, of course, the data file has some completely irrelevant lines, which we need to skip over.

Having come to some idea of how the data file was structured, I came to the conclusion to parse each line using exact positions, e.g. to determine the MnT value by extracting the substring of each line from position 9 to 14. So, the top-down approach is like this: we will open a file and for each line in it and we will try to parse it according to some pattern specification where the pattern specification would specify the positions of some part and a parsing function.

Clojures approach to file handling is a little bit surprising for somebody coming from Common Lisp, which provides three main concepts you need to grasp: filenames aka pathnames, files and streams. Typically, you open a file with with-open-file which will guarantee that the file will be closed after you leave the block. Clojures with-open macro abstracts this idea of safe file handling to the next level in that it is not restricted to files. However, with-open does not simply work with a filename as an argument, you need to pass in a resource which follows the open/close protocoll which a simple filename string does not. That Clojure leverages the Java IO library for this is not surprising, but that it leaks this Java dependency to the users is. I assume this implies that (with-open (clojure.java.io/reader "/some/filename.txt")) will not work on ClojureCLR (apparently slurp does). read-line is also a false friend for a Common Lisp programmer, as it can only be used for reading a line from the REPL but not for reading from some stream like in CL. While I appreciate the possibility to call Java methods from Clojure, I prefer using any abstractions that Clojure provides, so I went with line-seq instead of calling .read. Finally, as Clojure does not allow for re-assignment, so we need a recursive approach with an accumulator to hold intermediate results while looping over the lines. So, the skeleton looks something like this in Clojure:

 (defn read-some-file
   "Skeleton for reading some file"
   [filename]
   (with-open [rdr (io/reader filename)]
      (loop [lines (line-seq rdr) result]
        (if (empty? lines)
             result
      ; todo for the weather task: 
      ; compare the result of parsing a line with the current 'best' 
      ; i.e. minimum spread value so far and recur, possibly with different data
              (recur (next lines) result)))))))

This is more or less the equivalent of Perl’s -n command-line switch or while (<>) {...} construct.

The comparison mentioned in the code’s comment is actually very easy and what needs to be put in place for the result is also obvious. But in order to do that we have to parse the lines first. Following up on the idea of how to parse a line, the relevant function is subs which returns a substring from start to end. Now, we basically need to parse different substrings for the various data fields, quite often returning just an integer. Again, the simplest way of parsing a string to an integer is provided by a thin layer on top of a Java library. Neither the re-find nor the exception handling would be necessary if we could guarantee that the data would always only consist of correct data, this way we just silently skip over invalid data.

(defn string-to-int
  "Parses a consecutive set of numbers into an integer or return nil"
  [string]
  (try
    (Integer/parseInt (re-find #"\d+" string))
    (catch Exception e nil)))

parse-line is fairly straight-forward: the basic idea is that we have some specification of how a line is structured. This specification is a map of start and end positions plus parsing functions. So for the weather data, the pattern looks like this:

(def day-pattern
  ;this pattern is not complete and could be extended
  (hash-map :day [1 4 #(first-word %)]
            :MxT [5 8 #(string-to-int %)]
            :MnT [9 14 #(string-to-int %)]
            :AvT [15 20 #(string-to-int %)]))

first-word is another small helper function which basically just retrieves the first continous non-whitespace characters of a string:

(defn first-word
  "Returns first consecutive non-whitespace chars from string"
  [string]
  (re-find #"\S+" string))

parse-line than just loops over all parts of a pattern, extracts the substrings and calls the parsing function. It recursively conjures up a hash-map with the extracted data or returns nil if some parsing error occurs.

(defn parse-line [line pattern]
  "Parse a line with data in fixed positions using pattern.
Pattern should be a map consisting of a key for the data to return,
a start and end position and a parsing function for each data element.

Returns a map with all extracted data or nil for unparsable lines."
  ; loop solution with accumulator for results
  (loop [remkeys (keys pattern) linemap {}]
    (if (empty? remkeys)
      linemap
      (let [key (first remkeys)
            [start end parsefn] (get pattern key)
            value (parsefn (try
                             (subs line start end)
                             (catch Exception e nil)))]
        (if value
          (recur (rest remkeys)
                 (conj linemap
                       (hash-map key value)))
          ; silently skip any parsing errors
          nil)))))

This then allows us to put things together: we only need to compare the difference between MxT and MnT of the current line with the previous smallest temperature spread. We will use destructuring of the result of hte parse-day results to retrieve the needed data, sprinkle in some sanity checks and are done.

(defn parse-day
  "Parse a day from a line"
  [line]
  (parse-line line day-pattern))

(defn find-lowest-temperature
  "Return day in weatherfile with the smallest temperature spread"
  [weatherfile]
  (with-open [rdr (io/reader weatherfile)]
    (loop [lines (line-seq rdr) minday 0 minspread 0]
      (if (empty? lines)
        minday
        (let [{mnt :MnT mxt :MxT curday :day} (parse-day (first lines))            
              curspread (when (and mnt mxt) (- mxt mnt))]
          (if (and curday curspread
                   (or (= minspread 0)
                       (< curspread minspread)))
            (recur (next lines) curday curspread)
            (recur (next lines) minday minspread)))))))

When I started working on the second task, solving the soccer issue, I did a simple copy and paste of the find-lowest-temperature, added a new pattern for extracting the data and made the small changes to adapt to the different fields. I also understand the comparison requirement to look at the absolute difference. This leads to the following functions:

(defn abs 
  "Returns the absolute value of x" 
  [x]
  (if (pos? x) x (- x)))

(def soccer-team-pattern
  ; this pattern is not complete
  (hash-map :pos [1 5 #(first-word %)]
            :team [7 22 #(first-word %)]
            :fval [43 45 #(string-to-int %)]
            :aval [50 52 #(string-to-int %)]))

(defn parse-soccer-team
  "Parse a soccer-team from a line"
  [line]
  (parse-line line soccer-team-pattern))

(defn find-minimum-goal-difference 
  "Return team in soccerfile with the smallest difference in for and against goals"
  [soccerfile]
  (with-open [rdr (io/reader soccerfile)]
    (loop [lines (line-seq rdr) minteam 0 mindiff 0]
      (if (empty? lines)
        minteam
        (let [{aval :aval fval :fval curteam :team} 
              (parse-soccer-team (first lines))            
              curdiff (when (and aval fval) (abs (- fval aval)))]
          (if (and curteam curdiff
                   (or (= mindiff 0)
                       (< curdiff mindiff)))
            (recur (next lines) curteam curdiff)
            (recur (next lines) minteam mindiff)))))))

This, of course, led straight to the insight that it should be simple to extract the slight differences and make them parameters to some find-*-difference function. The following things are differently: the parsing pattern, the extraction function for the result value and the function used to compute the difference between values. If you would want to it would also be possible to make the comparison function configurable.

(defn find-some-difference 
  "Return some result from a data file which has some lowest difference"
  [filename parse-pattern resultkey diffn]
  (with-open [rdr (io/reader filename)]
    (loop [lines (line-seq rdr)
           result nil
           mindiff 0]
      (if (empty? lines)
        result
        (let [data-map (parse-line-map (first lines) parse-pattern)
              curresult (get data-map resultkey)
              curdiff (diffn data-map)]
          (if (and curresult curdiff
                   (or (= mindiff 0)
                       (< curdiff mindiff)))
            (recur (next lines) curresult curdiff)
            (recur (next lines) result mindiff)))))))

(defn find-mingoal-diff-fusion
  "Return team in soccerfile with the smallest goal difference, using the fusion fn."
  [soccerfile]
  (find-some-difference soccerfile soccer-team-pattern :team
                        (fn [{aval :aval fval :fval curteam :team}]
                          (when (and aval fval)
                            (abs (- fval aval))))))

There there was another itch I wanted to scratch: the parse-line function has some ugliness to it. For starters, it is handling possible exceptions from subs directly. It is also checking return values for nil. Both cases are what Common Lisp would see as conditions rather than real exceptions — it is rather unfortunate that Clojure opted for the more simple, although more traditional exception concept from Java. To remedy the uglyness of parse-line we can simply replace the direct call to subs with a small handcrafted call which manages any exceptions and also change the behavior of parse-line to simply return nil for all unparsable elements. But there is more that makes parse-line ugly: I dislike the recursive nature of the solution and the linear result handover in the let declaration (well, this handover was intentional to not have a functional train-wreck of calls). I wanted to see whether I couldn’t come up with a more elegant map/reduce solution. Here you go:

(defn substring
  "Returns substring from start to end from string or nil"
  [string start end]
  (try
    (subs string start end)
    (catch Exception e "")))

(defn parse-line-reduce [line pattern]
  "Parse a line with data in fixed positions using pattern.
Pattern should be a map consisting of a key for the data to return,
a start and end position and a parsing function for each data element.

Returns a map with all extracted data which maybe empty."
  ; map-reduce version
  (reduce #(conj %1 %2)
          (concat [{}]
                (map
                 (fn [[key [start end parsefn]]]
                   {key (parsefn (substring line start end))})
                 (seq pattern)))))

We are simply mapping over the entire pattern and use argument destructuring again to extract the relevant parts of it, but this time, due to the call to seq a pattern part will be a sequence, not a map. The map call with the anonymous function will produce a sequence of hashmaps with key and parsing results, which then gets reduced to a single map. In order to use reduce, you have to provide a function taking two arguments: the first will consume the intermediate result, the second will be the next value of the sequence to reduce. This is the reason why we have this ugly concat [()] ... in front of the call to map: we need to provide the initial value for reduce which in this case is an empty hashmap. An even more concise version replaces the call to reduce with into, resulting in a version which looks pretty idiomatic to me and is also way easier to understand then the lengthy recursive version above.

(defn parse-line-map [line pattern]
  "Parse a line with data in fixed positions using pattern.
Pattern should be a map consisting of a key for the data to return,
a start and end position and a parsing function for each data element.

Returns a map with all extracted data which maybe empty."
  (into {}
        (map
         (fn [[key [start end parsefn]]]
           {key (parsefn (substring line start end))})
         (seq pattern))))

Of course, we can use a similar approach for the find-*-difference function. We could also take a slightly different approach and sort the parsing results and then treat the minimal value as the result. If we combine this with slurp the resulting code also becomes way more compact. In order to filter out only partially parseable lines, we need to supply a list of keys that the parsing result must have values for.

(defn sort-diff-map
  "Return some result from a data file which has some lowest difference"
  [filename parse-pattern desiredkeys diffn]  
  ; map-filter version
  (sort-by diffn 
           (filter #(every? (partial get %) desiredkeys)
                   (map #(parse-line-map % parse-pattern)
                        (string/split-lines (slurp filename))))))


(defn find-mingoal-map
  "Return team in soccerfile with the smallest goal difference, using the sort-map fn."
  [soccer-file]
  (get (take 1 
             (sort-diff-map soccer-file soccer-team-pattern 
                            [:team :fval :aval]
                            (fn [{aval :aval fval :fval curteam :team}]
                              (when (and aval fval)
                                (abs (- fval aval))))))
       :team))

Summing up, what have we seen during this kata? Clojure’s platform dependent approach to reading files, usage of regular expressions, destructuring (again), anonymous functions (again) and map/reduce. Overall not very exciting. I know it’s not a fair comparison, but I would always opt for solving such tasks with Perl, especially if they are so trivial as in this case in which you can solve each task with a one-liner, basically. There is room for using more elaborate languages (e.g. Python, Ruby, Clojure) if parsing and processing become so elaborate that it makes sense to have more structure in the code. But for a task of the size of this kata, the amount of code required is usually not worth the effort.

Posted by Holger Schauer

Defined tags for this entry: ,
Jun 7

I wanted to learn Clojure for a while but never made much progress beyond the level of where I had a running slime connection to Clojure and had a simple initial leiningen project file for a web project with ring. I just got distracted by other stuff but more importantly, I had no goal for a web project, so I already failed with thinking about what I would build. Consequently, I stopped reading Fogus and Housers nice book Joy of Clojure. After quite a while (close to a year), I stumbled upon some posts reminding me of the code katas on the pragmatic programmers web site (cf. code katas). Somehow this triggered the idea that doing these katas might be a good way to learn Clojure finally.

In the linked blog posts I’ll describe the various solutions I worked out. This post here will serve as the main overview and get updates as I move along.

The code (as well as the original markdown for these articles) can be found on github.

Kata 14 concludes my little journey into the original katas. It took far too much time writing these, as I didn’t had enough time to work on them continuously. Besides, many of the katas I’ve left out seemed rather uninteresting and some are thought experiments.

Posted by Holger Schauer

Defined tags for this entry: ,
Apr 15

Last week, I visited this years Goto Zürich 2013, which is a two-day conference with tutorials wrapped around it. Zurich Topic areas range from lean start-up over technology to a leaders track. I spent most of my first day on the so-called leaders track as here most talks revolved around adopting agile and lean methods, but also went to some more technical talks in other tracks, too. I’ll summarize most of the talks and my impression below. Btw., the slides to most of the talks can be found on the conference website, more precisely on the schedule overview for the two days.

The conference started out with a keynote by Scott Ambler on his work on Disciplined agile delivery which provides a framework on where decisions are needed when implementing agile methods in a larger setup, drawing from many different methods like Scrum, XP, Kanban, the Scaled Agile Framework and many others. The first part of his talk was mainly concerned with the question if agile holds up to its promise of delivering ‘better’, which he tries to answer with regular surveys. These seem to confirm that team size, location, complexity and methods used do have an impact. Not surprisingly, small team size are more successful than larger, co-located teams better than distributed ones and simple projects are much more likely to succeed than complex ones. And of course, there are still projects using waterfall-like approaches that succeed, while the difference between an iterative and an agile approach are rather minimal. Still, however, the number of project failures are always high, even for simple projects there are more failures than one might expect.

The first talk here was about Spinning by Ralph Westphal, which is well-known e.g. for his association with the Clean Code Developer initiative. His premise was that today’s business reality is a constant change of priority which is at conflict with assuming bigger time boxes during which developers can focus on some goal. His answer is that we should ‘seize the day’ (as Dan North might have put it) and deliver ‘value’ daily. It’s important that this value should be something worthwhile to the customer and the customer should be able to give feedback daily as well. I’m wondering whether it is really always possible to accomplish this, e.g. fixing a bug might require analysis well beyond a day. But even if you might not always hold up to the idea, it might still be a worthwhile guideline for organizing work. Another question which I don’t have an obvious answer for is the question who should be in charge to decide what should be worked on on any given day? Ralph required a thorough triage to be carried out to avoid wasting time, but it’s unclear whether this job belongs e.g. in the hands of the product owner.

Dominik Maximi spoke next about ‘Hostile waters’, e.g. how company culture might influence your chances and approaches to introducing agile in an organization. He made the important point that every company culture exists because it is (or was in the past) successful to work like that. This needs to be respected when you want to change something fundamentally. He then gave a nice overview of the Schneider model on how to classify company culture. Non-representative survey results indicate that ‘agile’ has a similar characteristics to ‘collaborative’ or ‘cultivation culture’, but doesn’t fit in so nicely with a ‘competence’ culture or ‘control’ (no surprises here). Changing the mindset of a company might take up to 7-10 years. Dominik finally discussed John P. Kotter’s work on change steps to implement agile.

Continue reading "Goto conference Zürich 2013"

Posted by Holger Schauer

Defined tags for this entry: , , , ,
Mar 29

The first real programming task on CodeKata is Kata 2, Karate Chop. Or as the introduction says:

A binary chop (sometimes called the more prosaic binary search) finds the position of value in a sorted array of values.

I started out with the simplest approach I could think of: there just might be a function already readily available in Clojure which solves the problem. After all, Common Lisp has position whereas Pythonistas would use the index method on lists. I couldn’t really find a function on clojure sequences which would immediately take care of the issue, but keep-indexed seems close enough — taking a function and a collection, it calls the function which ought to take an index and a value, and keeps the function’s returned non-nil values. This led to the following code:

(defn chop [x coll]
    (let [result
            (keep-indexed
               (fn [idx item]
                 (when (= item x) idx))
               coll)]
        (if (empty? result)
            -1
          (first result))))

Some points may be of interest here:

  • Looking over some code, I ran into the usage of %1 etc. to refer to implicit arguments, which I didn’t knew about when I started out. This version screams ‘Common Lisp’ pretty much all over.
  • I also ran into the empty sequence <> false issue, of course and had to look up ‘empty?’ as well.
  • Using when instead of if when you only care about the true state is an idiom I knew and love from CL. The if part at the end is still ugly. We could get rid of it by relying on first on an empty sequence to return nil and using this in a boolean comparison.

This leads to the following much shorter version:

(defn chop [x coll]
    (let [result (keep-indexed #(when (= %2 x) %1) coll)]
        (or (first result) -1)))

The next idea is to use a multi-method approach, dispatching on either values, possibly empty ones and on type, of course. This is an approach which I think should be possible with CLOS, but is quite outside of mainstream object-oriented languages like Java or Python. We could combine this with a recursive approach. One base case of the recursion would be the empty collection, of course, with the other one being finding the searched value, returning the current index in the sequence which we have to carry around (straight forward recursion). ClojureDocs example 683 has a nice blue print (http://clojuredocs.org/clojure_core/clojure.core/defmulti#example_683). The result has a nice declarative touch to it, which reminds me of my old Prolog days:

Continue reading "Coding katas Clojure -- Karate chop / binary search"

Posted by Holger Schauer

Defined tags for this entry:
Mar 29

While not being a kata, setup of the environment in which it’s possible to do the programming for them is a task that needs to be fulfilled anyway. I hence see that as some sort of a separate kata, to familiarize oneself with various development environments.

These are the requirements for setting up a Clojure development environment on Windows, using a portable apps approach. In more detail these are the exact requirements:

  • All portable applications are available on drive J: This is a USB stick in my case.
  • We’ll need git, emacs and clojure of course.
  • We’ll also need leiningen at some later stage.
  • Configuration of the application will be stored on the external drive (e.g. J:) as well where possible.
  • We’ll use nrepl instead of slime/swank. I never got the latter set up to work correctly as portable apps.

This describes the setup I’m currently using. First of all, the downloads. For the stuff that has repositories on github, I would suggest using git clone <repositoryURI> (after you’ve installed git in the first place, of course).

  • Emacs (24.2 at the time of writing) can be downloaded from the FSF Emacs server
  • git (1.7.6 at the time of writing) can be downloaded from msys
  • leiningen requires wget, which can be installed e.g. from here. Another option would be to install MinGW with git and wget, cf. MinGW
  • clojure (1.5.0 at the time of writing) can be downloaded from Clojure.org, of course.
  • leiningen version 2 can be downloaded from leininigen github repo
  • clojure-mode version 2 can be installed via Marmalade/ELPA or manually from it’s github repository
  • nrepl.el can be also be installed via Marmalade/ELPA or manually from it’s github repository
  • I’ll throw in magit for smooth Emacs interaction with git, to be fetched via Marmalade/ELPA or manually from magit’s repository

I’ll use the following directory layout: All applications are stored under J:\\progs\, e.g. Emacs 24.2 will end up as J:\\progs\emacs-24.2\. I put the clojure.jar into J:\\progs\\clojure\ and will put lein.bat along with it into the clojure directory. The following shows the resulting directory as shown by dired:

  J:
  insgesamt 5124
  drwx------  3 schauer schauer   16384 Nov  1  2011 emacs
  drwx------ 12 schauer schauer   16384 Okt 16  2012 progs

  J:\\progs:
  drwx------  8 schauer schauer     16384 Nov  1  2011 clojure
  drwx------  8 schauer schauer     16384 Okt  7  2012 emacs-24.2
  drwx------ 11 schauer schauer     16384 Nov  1  2011 git
  drwx------  8 schauer schauer     16384 Nov  1  2011 wget

My Emacs configuration resides in a separate directory on J:, namely in J:\\emacs\. As I already have quite a lot of emacs configuration, I’m going to put all configuration options into separate files, which are placed in J:\\emacs\elisp\config\. Code from other people will go in separate directories as well, with J:\\emacs\elisp\others\ as the top-level folder. clojure-mode hence goes to J:\\emacs\elisp\others\clojure-mode. nrepl.el is a mode but a single file and goes straight into J:\\emacs\elisp\others\. Emacs looks for default.el or site-start.el during startup to look for personal or site-wide configuration. Both files can be placed in the site-lisp directory, i.e. in J:\\progs\emacs-24.2\site-lisp\

  J:\\emacs:
  drwx------  6 schauer schauer   16384 Nov  1  2011 elisp

  J:\\emacs\elisp:
  drwx------ 3 schauer schauer 16384 Nov  1  2011 config
  drwx------ 4 schauer schauer 16384 Nov  1  2011 development
  drwx------ 6 schauer schauer 16384 Nov  1  2011 others

Next we need to adopt the load-path, i.e. where Emacs looks for libraries. This means we need to put some content in J:\\progs\emacs-24.2\site-lisp\default.el that takes care of figuring out the drive letter and sets paths correctly:

(defun get-drive-from-filename (filename)
  "Returns a windows drive letter if filename contains a drive letter."
  (if (string-match "^\\(.:\\)/" filename)
      (match-string 1 filename)))

(defun get-drive-for-emacspath ()
  "Returns windows drive letter for the drive emacs can be found on."
  (get-drive-from-filename (getenv "EMACSPATH")))

(let ((emacsdrive (get-drive-for-emacspath))
       loadpath-additions)
  (dolist (dirname
       '("/emacs/elisp/"
         "/emacs/elisp/config/" 
         "/emacs/elisp/others/"
         "/emacs/elisp/others/clojure-mode/"))
    (setq loadpath-additions
      (cons (concat emacsdrive dirname) loadpath-additions)))
  (setq load-path
    (append loadpath-additions load-path)))

(require 'nrepl)        
(require 'clojure-mode)
(setq clojure-mode-inf-lisp-command 
      (concat (get-drive-for-emacspath)
           "/progs/clojure/lein.bat repl"))


(require 'magit)
(setq magit-git-executable
      (concat (get-drive-for-emacspath)
           "/progs/git/bin/git"))

The next step is to install leiningen. There are two ways: either downloading lein.bat and running it from cmd or downloading lein, the shell script and running it via the git bash prompt. I chose the latter. You will probably need to adjust your path to where you put the lein shell script, e.g. (bash syntax):

export PATH=$PATH:/j/progs/clojure/

To install leiningen locally (i.e. not in your %HOME%), you have to set the LEIN_HOME environment variable, i.e. like this (bash syntax):

export LEIN_HOME=/j/progs/clojure

Remember to always set this variable afterwards before running leiningen commands. Point your classpath to where you installed clojure:

export CLASSPATH=/j/progs/clojure/clojure-1.5.0/clojure-1.5.0

If you don’t want to set all these variables all the time, you can put them either in a .profile file in your %HOME% or in the global profile file that comes with git which resides in /j/progs/git/etc/. I added the following lines:

CLOJUREPATH=/j/progs/clojure 
if test -x $CLOJUREPATH
then 
     export PATH=$PATH:$CLOJUREPATH
     export LEIN_HOME=$CLOJUREPATH
     export CLASSPATH=$CLOJUREPATH/clojure-1.5.0/clojure-1.5.0
else
     echo "Can not access /j/progs/clojure"
     exit 1
fi

To figure out how to get rid of the hardcoded drive letter in bash is left as an exercise to the reader.

If you also want to keep the files / jars which leiningen retrieves in a local, non-standard maven repository, you need to set a variable in your $LEIN_HOME/profiles.clj file, like this:

{:user {:local-repo "j://progs/clojure/.m2/"
        :repositories  {"local" {:url "file://j/progs/clojure/.m2"
                                  :releases {:checksum :ignore}}}
        :plugins [[lein-localrepo "0.5.2"]]}}

Then run lein self-install. Afterwards, a lein repl should give you a Clojure read-eval-print-loop.

Now if you want to use nrepl and would like to use the support for nrepl/inferior-lisp which comes with clojure-mode you need to add a corresponding dependency to your project.clj for each project, cf. nrepl installation

Posted by Holger Schauer

Defined tags for this entry: , ,
Dec 26

I had the need to refactor a python package, better said, I had to rename the package. The package in question is using namespaces, which complicates the matter, as this implies that there are two steps to the process: you have to exchange all references to the module name and you have to re-arrange the package structure. Assuming you’re at the toplevel of your (otherwise clean) package directory, the following shows the manual steps involved in the process.

First we rearrange the directories:

Continue reading "Refactoring (the name of) a python package"

Posted by Holger Schauer

Defined tags for this entry: ,
Oct 29

The need for estimates is a fact of (project management) life, whether you’re using a traditional approach or agile ones. The latter try to do away with any exact numbers and instead use anything from t-shirt sizes over gummi bears to story points and treat the idea of estimates as a tool that is mainly intended for the team. However, as to be seen in this article posted on the ScrumAlliance website on story points versus task hours the matter of concrete time estimates comes up every now and then — there are customers and managers that want it and you have to figure out how your estimate / velocity is related to the team’s capacity for any given sprint. TL;DR:

For me, story point is high-level estimation of complexity made before sprint planning.[…]The task-hour estimation, on the other hand, is a low-level estimation[…]should be done during sprint planning for highest possible accuracy.

But, just like every other project manager, I made the experience that there is no such thing as “highest possible accuracy”, regardless of the number of ‘points’ you’re estimating (three-point estimates, anybody? How does that fit in with agile methods?) What does somewhat “work” (for varying degrees of “somehat”) is measuring how much work is ‘roughly’ possible in an iteration (aka velocity in Scrum) and estimating the ‘entire’ efforts according to the law of big numbers, but this is exactly not task-hour estimation. I’ve also worked with task hour estimates during the sprint planning as well, but I wouldn’t rely on the results. Exchanging estimates between developers (aka planning poker) has the main effect of quickly leading to a common understanding of the tasks that need to be done, which in my experience has much more value than any estimate. The insight that numeric estimates are not all that useful (to say the least) is also discussed in this highly opionated article on stop using story points — this is not about going back to estimating in hours or days, but about teams reaching a level of competence, expertise and confidence in their respective context where estimates are just waste.

However, reaching this transcendental level might be hard to reach for many teams, due to lack of experience or because of other more context dependent reasins. Joachim Seibert discusses in his nice article for the “ObjectSpektrum” magazine (5/12) “Agile estimating in teams” (behind a paywall, unfortunately and only in German, sorry), that if you need to come up with the big number for budgeting, you should strive to look at the entire picture. I.e. estimating the overall effort is more important than trying to completely estimate everything down to the tiniest detail — which isn’t possible anyway. Seibert lists two key ingredients for that: estimating a reference story which you stick to throughout the project and comparing the relative complexity of requirements, essentially sorting them by ‘size’. This is easy to forget with planning poker, but is at the very heart of the two other techniques that Seibert discusses: estimation game (by Steve Bockman, apparently) and magic estimation (by Boris Gloger, AFAICT). Both focus on building a common understanding of the relative complexity of the features to be build and are much simpler to apply to lots of stories. If you add velocity to the picture, i.e. you actively measure how much you can build and you can reach a stable project setup (compare this nice sweet spot for agile methods by Philippe Kruchten), you might be able to end up with something useful from all the effort you put into estimation.

Posted by Holger Schauer

Defined tags for this entry: ,
Apr 20

One aspect I think is important for a Scrum Master or Project Manager is to make sure that your team doesn’t go on a trip to Vienna (if that term doesn’t ring a bell, search for “Tom DeMarco peopleware”). Quite contrary to popular management belief, I think in general it’s not okay if “occasionally” somebody on the team “puts in some extra work”. There is a reason why many agile methodologies insist on keeping a sustainable pace. Besides all of the very good reasons for making sure your team members stay healthy (see this Burnout story as a negative example), there is also a management point to it: your understanding of what the team is capable of (in terms of results/effort, aka velocity) decreases substantially if you have to take “heroic behaviour” into account. It’s particular bad when you don’t see the connection between reached goals and involved effort, i.e. when team members just move their card from “working” to “done” late in the evening without making clear that it involved five hours more than initially estimated.

Heroic behaviour just can’t be counted on, because nobody will be able to keep it up over a substantial amount of time (that’s the very definition of not being substainable). It’s highly understandable that project members after having committed to some goal can be tempted to go out of their way to reach it. What team members might miss is that “heroic behaviour” can only have an influence on the “time” aspect of the magic triangle of “time, budget and quality”. Extra effort is just that: effort. Hence, it comes with a cost, with the cost it just takes to reach the goal. I’ve also seen that there is a misunderstanding of the term “commitment”. It’s not an unconditional promise of “I can do that task with the effort I think it takes”, there is also the implicit condition of “I understand correctly what the task involves and there is no other external negative influence” (e.g. the urgent bug that needs to be looked at, or the lack of sleep during three days of the week due to the kids being sick at home).

Commitment to a particular goal might at times conflict with taking responsibility for the project as a whole. As a general rule of thumb it’s nearly always much more important to think about the entire project / the big picture than about a small aspect of it. There is the exceptional situation that needs exceptional reaction and maybe exceptional effort. But it’s important to treat it like an exceptional situation. And for these exceptional situations it’s vital that they get treated like a mini project: they should have a clear purpose and have fixed start and end dates. Plus, they should come with a compensation. Scrum Masters and project managers alike should communicate clearly that exceptions are exceptions, not the rule. And team members should clearly communicate that it takes what it takes. When it comes to professional work, follow the 501 manifesto (in case you don’t directly understand the “501” part like I did: it’s not about jeans, but about leaving at 5:01pm).

ObTitle: Morrissey, from “Ringleader of the tormentors”

Posted by Holger Schauer

Defined tags for this entry: ,
Oct 24

In a recent meeting, a colleague of mine mentioned that we wanted to use agile development this time with that new project and that I would provide some insight as an ‘agile development expert’ (not that this would be a term I would use). This in turn brought me some curious looks and a pretty general question during the discussion: what is the benefit of an agile development model from a business perspective anyway?

If I remember correctly, my answer went in the direction that with an agile approach you can change your mind about what you really need/want to implement throughout the project, if you learn something ‘new’ about what is really required. The second important aspect is that you get quicker feedback which allows for more influence in case things don’t turn out right. After the meeting I started pondering the question a little, as I had the impression that my answer wasn’t that convincing, so what you find below is a more detailed written elaboration of it.

Basically, what “you can change your mind throughout the project” boils down to is that agile development gives you more options to make use of opportunities coming up along the way/throughout the time you need to implement the project. The most important idea here is that you don’t make decisions too early, because that limits your options. For an example, let’s assume that one feature we specify is that we want to have a ‘Facebook like’-button on the order page. Now, if we’re following the waterfall style ‘implementation follows after specification has been completed’ development model, you would specify that and design the page with the button and at some time in the future, you hopefully get your Facebook button. But maybe during the hypothetical nine months from now in which development happens, what if Germany’s privacy laws enforce new severe restrictions on the use of such buttons and, btw., Google+ is now the new hip site you have to support? You have to go through a (with larger features typically quite costly) change process, redo the specification, etc. This essentially makes the cost you had in writing the specification as well as for the design etc. a double problem: you didn’t get out any ROI, obviously, but additionally time has passed while writing specs, doing design etc. in which you didn’t really work on any results for the customer (lost opportunity: gather new clients with at least some new features).

So, you would have been better off with having the idea that at some point throughout the project you want ‘social media integration for marketing reasons’ and delaying what to do exactly until you’re really there (from a priority point of view) where this is a feature that you want implemented — i.e. you do the specification when your planning that feature to get implemented in the next development phase (which is typically two to four weeks with an agile development team). The idea is to take small, well defined steps that can be taken fast and starting out with the most important/valuable ones first, rather than to spend a lot of time on some ‘complete specification’ which withers fast throughout the projects implementation time.

The problematic point here is obviously, that it’s pretty difficult to say when exactly is the right point in time to make a decision. In general, there is no ‘right point’ — also often called the last responsible moment after which some opportunity is lost (I think the phrase was coined by Mary and Tom Poppendieck, see the excerpt from ‘Lean software development’) — but multiple points in time, depending on many aspects. Influencing aspects are things like business opportunities (e.g. a new feature nobody else has at that time) or reducing risks (projects risks) (cf. Alistair Coburn reconsidering the least responsible moment). The best answer I currently have seen when to make a decision is that it should be taken when you have ‘enough knowledge’, i.e. when you have carefully considered your options and evaluated them (cf. real options). Even thinking about options alone is typically a highly valuable undertaking in itself, because all to often people just assume that the ‘obvious way’ is the best one.

So, the general answer is probably along the way of: for the business side, agile development has the benefit that it provides more flexibility and more influence throughout the project phase while at the same time (possibly) generating real value/revenue more early. There is, of course, more to it that also has appeal from a business perspective, but that’s probably the main point. Feel free to add different opionions.

Posted by Holger Schauer

Defined tags for this entry: ,
Apr 4

So, ELS 2011 is over which was the first conference I attended that was solely aimed at Lisp programmers. Overall I am quite happy with it although not all talks have been of the same quality. In particular I wasn’t too excited about all three key notes, although all had interesting topics. The first one by Craig Zilles about best effort code optimization was about things intelligent compilers could do. Very interesting stuff for sure and I learned a lot about low-level soft- and hardware architectures but there was no apparent direct relation to Lisp. A similar problem troubled the talk about Scala: perhaps it was due to my late arrival (I got on the right subway but in the wrong direction, not for the first time) but the part I attended left me wondering why Scala is relevant on a Lisp conference. Marc Battayani’s invited talk about his use of Common Lisp for programming FPGAs was nice, but first of all it was difficult to follow (not due to the content) and not many details were given about how the specialized Lisp embedded DSLs get converted to the FPGA specific code and what problems he had to overcome.

picture of Hamburgs harbourNow for some of the interesting regular talks: on Thursday, the report about porting SBCL to the supercomputing Blue Gene/P was nice and raised an interesting question: what can/needs to be re-discovered from old Lisp dialects for parallel programming for Lisps when more and more parallel cpus are becoming available to programmers. An issue that came up in both the talk about the futures implementation for ACL2 and in Nicolas Neuss’ initial digression about his experiences in parallelizing Femlisp was that garbage collection can get in the way of effective parallelization, up to the point where much of the expected speedup is lost. The motivation of the talk about actors framework named Jobim for Clojure nicely fitted in with an immediate question that came to my mind when I saw how Clojure connects to Java: What do you do so that Java’s semantics don’t leak into your application code? They seem to have found a nice way to abstract away the underlying Java libraries in their framework. In the last session of the first day, the lightning talk session, two things were interesting: Ralf Möller talked about using user-defined method combinations as a more powerful approach than design patterns, showing how one might implement html specific print-object methods, and Didier Verna talked about user-extensible format directives which he wants to turn in as a CLRFI. Having done a lot of work in computational linguistics, the talk about S-NLTK, the Scheme toolkit for natural language processing, was nice to see. Damir Cavar did a good job promoting the toolkit which has a similar aim like the Python NLTK, although I would have liked to learn more about the API and implementation issues. Finally, Alec Berryman by ITA gave a last minute presentation about things ITA learned about optimizing stuff for SBCL and about issues arising when adopting the old code to multi-threaded programming. Interestingly they didn’t report about gc issues but that may be related to their extensive use of object caches of pre-allocated objects.

The final panel discussion with James Knight, Christophe Rodes, James Anderson and Martin Simmons went back and forth about concurrency, distribution and efficiency vs. performance. The discussion took up several points from the talks, including gc and hardware issues. I took away from the discussion that unsurprisingly a lot of open questions need to be solved of which people are aware while at the same time there doesn’t seem to be much momentum, which, given that the community isn’t that big, isn’t surprising either after all.

Summing up I liked it a lot. For one, it was very nice to see people you only know via the net. For another, I also think that the organizers made a good decision to select a main theme for the conference and an important one, too. It really set the main theme for the conference and the discussions, and hence nicely reached its goal. Generally speaking, the conference was nicely organized so thanks for a pleasant time in Hamburg.

Posted by Holger Schauer

Defined tags for this entry:
Mar 29

This has been a rather unpleasant month (don’t ask, I won’t tell) but right now I’ll look forward toward its end because of two reasons: for one, I’ll be in Hamburg for the European Lisp Symposium for the next two days; the program for the ELS has also been published in between. I’m really looking forward to an interesting set of talks. For another, some patches to CL-SQL which add support for autoincrement behaviour for Postgresql, are probably going to be released soon. To clarify, “autoincrement” is a column constraint in MySQL (among others) that automatically increments the value of the column when a new row is inserted when no value for the autoincrement column is given (cf. MySQL docs on AUTOINCREMENT), a behaviour that Postgresql supports with the serial constraint (cf. this wikibook on converting between MySQL and Postgres). Actually, that has been my first substantial amount of Common Lisp programming in the last two years, which has been triggered by an upgrade of my Debian system. This upgrade implied that an old application of mine would now use CL-SQL version 5.0 which in turn broke the app: I had simply specified a db-type of “serial” previously, but the new CL-SQL code wouldn’t recognize that it had to fetch the automatically generated value from the DB when inserting a new record. More details on the patches can be found on the CL-SQL mailing list.

The developement of this addition was also the first time I had a real-world setup developing with git. In my own projects I use mercurial, so I was eager to learn a little bit more about the differences. It’s funny that a recent opinonated article “Why I like mercurial better than git” more or less talks only about the one point that I found confusing: branch handling. For more background information, I suggest reading this article “A guide to branching in mercurial”. Basically, in my current projects where I use mercurial, I’m using the “branching with clones” approach Steve is describing there. When working on the patches for CL-SQL, I was working on the existing autoincrement branch but when I was through I wanted to port my patches to the master branch. When using mercurial with the described approach, selecting (pulling or pushing) my patches and only my patches to the master branch is dead easy: you just issue a pull/push command restricted to the “right” changesets. Doing this is even supported by Subversion these days via svn cherry picking. Looking at the docs for git pull, fetch and merge, I wasn’t able to figure out what the corresponding “right” incantation for git might look like, if there is one at all. As I didn’t want to hose my “working copy” (sorry for the SVN term again), I resorted to git format-patch, git am resp., which worked fine. Please note that I’m not suggesting that it’s not possible with another approach, quite to the contrary I would be happy to learn about it. One thing that I found rather useful is git’s stash command which let’s you safely abandon your current work to fall back to the last commited version, in order to be able to work on something that popped up in between (typically a minor unrelated problem you encounter while working on a larger piece of changes). I understand that mercurials patch queues enable a similar functionality, but I haven’t used them sofar. Another thing that I found very useful is git’s very easy way to correct (or in git terminology “amend”) a commit by just issuing “git commit -a”. I also like the idea of the “index” or more exactly that you have to explictly “add” which changes you want to commit. A similar behaviour is possible with SVN “changelist” command, but the mere existance of a changelist is not automatically honoured by SVN’s commit.

Posted by Holger Schauer

Defined tags for this entry: , ,
Sep 22

For the holidays I finally bought Peter Seibels Coders at work, which is a very unusable book about programming: it consists solely of interviews with pretty well known programmers or “coders”. It’s an interesting constellation: On the one hand, Peter Seibel is well known in the Common Lisp community for his book Practical Common Lisp which gives a modern view on Lisp: not only is it an introduction to the language but also to several libraries and the general setting of modern lisp programming. On the other (fifteen) hands, there are people like Jamie Zawinski (XEmacs, Netscape), Don Knuth (TeX, Art of Computer Programming), Guy Steele (Lisp, Scheme, Java), Peter Norvig (PAIP, Google), Brendan Eich (Javascript) and Ken Thompson (Unix) — just to note the ones that are probably the most well known.

I’ve had resisted the urge to buy the book because I’ve always felt that programming is a craft that ultimately forces you to make your own experiences. I mean, you can read all the books you like but ultimately you have to make your own hands dirty to really get knowledge about the issues involved. So, what could I learn from other peoples experiences? On the other hand, as a lightweight (in terms of reading attention) holiday book it seemed about right, so I finally gave in.

Well, the book turned out to be a real page turner for me. It’s a fascinating read because of the re-occuring topics Seibel is addressing and the various opinions he got. He addresses topics you would expect like preferred tools (e.g. editor), worst bugs, debugging techniques, asssertions and verification, literate programming (which suprises me a little), design approaches and team work, but of course the main focus is the personal experiences and how they wound up with whatever made these guys known. One thing that I liked is that Seibel has a way to ask good follow-up questions to the responses he gets, without ever letting his own experiences or opinions getting in the way, which I can imagine has probably made for pleasant interview situations (at least I take away that impression). I wouldn’t have imagined beforehand that I would find the different stories how the guys (and one woman) got into coding so interesting. There are very few people in this book whose experience doesn’t go back to teletype and time sharing systems. Of course, as a result these stories tend to be similar, but the details differ enough that’s it doesn’t get too boring. Starting with computers in the early 80s, I don’t have any experiences with such systems and which I frankly don’t miss at all after reading more about it. But just to get to this conclusion is interesting: the constant comparison with your own experiences and opinions you can’t help but make while reading this book alone is worth buying and reading it.

Over all, it’s hard to say which interviews I found the most interesting one, essentially each has some unique point or other. That being said, the interviews with Joe Armstrong and Guy Steele made a lot of impresssion on me, whereas I’m a little disappointed by the one with Peter Norvig (though he had the funniest quotes), but I can’t really nail down why. I didn’t particular like the interview with Brad Fitzpatrick, it didn’t seem to contain as much information as the others. And Joshua Bloch seemed to hype Java all the time which I found not very convincing — the idea that todays larger context for programming contains quite a few different languages and approaches seems to elude him.

There are some points I took away from this book: For one, most of the interviewed people seem to be much more concerned with data types than I am, even the ones who have done extensiv work on dynamically or weakly typed languages. I guess I should really take a closer look at that topic and, to make it more concrete, play around with e.g. Haskell. Another point is that concurrency or parallel programming is a topic that (IIRC) all of the interviewees have seen as being responsible for the worst bugs they encountered and as a result are interested in newer approaches like STM. So, it might be worthwhile to look closer into such developments, for example by playing with Clojure, Erlang. or the transaction monad, if I’ll ever really play around with Haskell. A third point is that I realized that I’m not keeping up with academic research in CS and, not having TAoCP, might never have been up to date at all. I’m following a few online references like LtU, but not closely and it’s pretty rare these days that I look deeply into some research paper. This is something else I should probably change, if time permits.

Posted by Holger Schauer

Defined tags for this entry: , ,
Nov 9

Via Lambda the Ultimate I came across an interesting article On data abstraction, revisited by William Cook, written for OOPSLA’09. It carefully dissects abstract data types from objects. All theoretical considerations aside that distinguish ADTs and objects, there is one common characteristics given by Cook: you can’t inspect the concrete representation of the data you’re abstracting. This is in itself interesting and reminded me of two rather practical things.

First of all, I was reminded of a section in Bob Martins Clean code development which discussed the idea that you should on the one hand follow the rule “Tell, don’t ask” and on the other hand have data access objects that don’t have much, if any behaviour besides providing data. This is obviously directly related to Cooks article: if you want data abstraction, you shouldn’t really provide any way to allow other objects/methods to access the internal representation. This somewhat also forbids getters as this is likely to lead to leaky abstraction, since more often than not programmers simply return the value of some data field, directly exposing the representation chosen. Now, please note that this does not necessarily follow from Cooks article, as it is possible to design getters in such a way that you can return whatever you want for a getter method, i.e., you can return a desired return type or an object satisfying a particular interface. For me, the relevant point here is the way of thinking about the kind of object at hand: do I want some behaviour (aka Cooks objects) or do I want a data sink. In the former case, and in line with what is suggested in the clean code book, it is arguably the best way to tell the object to do what is necessary rather than to inspect (get) the data it holds and do it externally in some other object/method. But even in the latter case, I think it is important to give great attention to hiding the internal representation from external access and to only allow very focussed access to the data itself. It could and has been argued that restricting the access to the stored data via getter methods is tedious (see e.g. the discussion in getters/setters/fuxors) and that allowing public access to members is allright, but looking at the issue from a data abstraction point of view it simply boils down to the question whether you want or need data abstraction or not.

Second, I’ve recently seen these two postings on the merits of the Zope Component Architecture: The emperors new clothes and the reply The success of the ZCA. Malthe asks why one should use the ZCA to override the use of a particular implementation with another instead of using some kind of reloading (or rather says that the latter is the preferrable approach). Relating this to Cooks article, Malthe could be paraphrased roughly as: we have ADTs all over the place and we only should allow only one implementation per ADT (this is what the type system would guarantee in other systems). If you want another implementation (of some interface, as Cook shows for his objects), you should reload the object defintion with the one you want. The use of the ZCA, however, is directly related to the very idea of object oriented programming in the way Cook defines it: you only have interfaces that are the relevant defining characteristics of objects (values) and hence, the use of the ZCA is the way to deal with multiple implentations in Zope (or Python). For me, all I can say is that I’m happy that the ZCA and hence the ability to easily intermingle multiple implementations is there (then again, with me reading computer science theoretic articles I’m arguably not of the angry web designer type whose benefit Malthe is arguing for).

There is another, more puzzling aspect of the article to me. After some considerations, I have to conclude that of all OO languages I happen to know, it’s really only Java that seems to be object oriented in Cooks view of the world. This is because in Java, you can define a method to return objects satisfying an interface. In addition, in dynamically typed languages like Python, Ruby, or CLOS, you could try to come away with duck typing, but it’s arguably only Python which tries to take it to the heart (for instance in CLOS, most values you’re gonna deal with are non-CLOS values and you even have an ETYPECASE statement, which is a switch-statement on type distinction). Funny enough, Cook finishes his Smalltalk analysis with the statement that “one conclusion you could draw from this analysis is that the untyped lambda calculus was the first object-oriented language”. But besides the point how some language is “more OO” than another, there is also to the point that in order to program truly object-oriented, you shouldn’t (and in Cooks world really can’t) rely on type checks, because the whole point of using objects as data abstraction is to rely on behaviour.

Posted by Holger Schauer

Defined tags for this entry: , ,
Oct 22

If you just stumbled over this blog entry searching for what the heck all that buzzing about agile methods amounts to, I may have unfortunate news for you: An agile approach to development might for a lot of people all over the world, but may be not for you. For starters, take a look at the agile manifesto and its principles. These have quite a lot of implications, directly translating into presuppositions about you, if you wanted to participate in an agile development project.

Now, perhaps you’re the well communicating, team oriented developer type who happens to work in a surrounding which you respect and general feel well motivated. But real life experience with your typical geek or ex-geek-turned-professional-developer suggests that having a hard time with direct communication, face-to-face, happening very often with those pesky guys and girls you don’t really like is a more likely scenario (“Eeek, those business and marketing people” anybody?). Now, you might say, baah, I know some communication is a necessary evil of professional software development, but maybe I can get away with my old habits of trading documents (specs, bug reports, etc.) against direct communication, since that is the trusted old way and what do you mean by face-to-face communication being more effective? Everybody hates those horrible meetings eating up all your time, right?

The point is that if you’re wanting to go agile you should better soon adapt your view or else you won’t see any of the promised benefits of all of those agile processes. Agile development is not only about using test driven design, timeboxing, iterations and releasing often. More than anything else, agility is about how people effectively interact with each other as quickly and direct as possible in order to come to solutions. To me, this basically boils down mainly to three issues: adaptability to (changing) situations, an open mind toward communication and taking responsibility for the project at hand.

Being lazy, sticking bone-headed to “trusted paths”, not seeking confrontation when it’s needed, avoiding communication because of being too busy “doing” is the anti-pattern to agility. You could keep a product and sprint backlog, present new features every other week to your customer and even use continuous integration and still would be muddling in pseudo agile, doomed to fail water. What I would suggest is that you should revisit the agile manifesto and think about what those principles might imply for you and your current behaviour. Let’s try an example: If you find that you might end up pointing fingers at your customer (or product owner to use the terminology from Scrum) because he won’t behave as demanded in the agile manifesto, you should reconsider if that fits in with what the agile manifesto demands of you: you wouldn’t do the project any good. You would build up frontiers where there should be a single team including the target of your pointed finger. This is not to say that you shouldn’t point out the defect, no communication about problems is another all too familiar reason for frustration and lack of commitment. What is necessary is direct communication and finding a way that works for all people involved.

Let me sum up this posting: if you’re interested in agile approaches to software development, that’s fine. If you don’t feel comfortable about what this might imply for you, that’s fine too. But it might also tell you that agile development approaches might not work for you.

Update: Irionically, I stumbled via Infoq about the one essential agile ingrediment which says all about it much nicer than I ever could have done, only that Mark Schumann shows the confidence that agile works for the most part whereas I hold the more pessimistic view that it’s not going to work for some, but for exactly the same reasons.

ObTitle: Hermann Hesse, “Steppenwolf”

Posted by Holger Schauer

Defined tags for this entry:
Jul 24

What have version control and testing to do with each other? Well, first of all, both are common virtues in the clean code community. What you’ll find is that both virtues are important on their own ground: version control provides a safety guard in that you can roll back to prior versions if you accidently introduce problems in your code. Testing (automated unit tests) provides a safety guard, too, because you can do regression testing when you work with your code. These are both fine goals but seemingly have little to do with each other.

But in reality they do. For sake of argument, let’s take a step back and assume that you have to work in an environment of several developers where neither of these things exists. What will you likely see? What we all have seen several years ago. Commented out code blocks, redundant and often misleading or outdated comments, timestamps with comments cluttered all over the code. And frightened developers that feared each minor change because of the miriad of subtle side effects it might have, let alone major changes to core components. It’s an environment in which refactorings as welll as extensions are very hard and expensive, which results in frightened overworked developers and frustrated managers.

So, what happens when you introduce only one of those virtues? Say, we introduce version control. Now, every change gets documented, except that documenting every change requires, from the developers point of view, documentation at the wrong point. They can’t see the documented changes and the reason for these changes in the source, they see it only in the version control system — iff they add a change message with every change at all. Much more likely is that you will see commit messages such as “.” or “bug fix”, and the same old mess of timestamps, outdated comments and commented out code as before. Why is that? Because your developers are now not as frightened as they used to be (because they can now rely on the version control system to fall back to older versions), but they still have the same need to understand and document the code. And the commit log is both “too far away” from the code and “out of it’s purpose” for this task: the commit log shouldn’t document what the code is supposed to do, only when something was implemented to behave in a particular way.

This is where a development (unit) test suite comes into the picture: you document every required behaviour in tests. With every change to the code, you also update the test. As a developer, you can now look into your test suite to see what the code is supposed to do. Now developers will likely become much more confident with their changes, because they can run the tests and see what happens (hopefully next to immediately) without requiring time- and resource-consuming manual tests.

But what about documenting the changes to the code? Well, you should simply document any changes in the commit message of your version control system, because it’s now no longer necessary to keep the entire version history in mind to understand what the current code state is supposed to do. You have the tests that tell you what the code should do. The commit log now only serves the purpose of documenting what has changed over time and is no longer required to understand what the code should do. So you don’t have to keep the clutter in your code, resulting in much cleaner source code files.

Summary: Taken together, the whole of version control and testing adds up to more than a simple addition of their own values.

Posted by Holger Schauer

Defined tags for this entry: ,

(Page 2 of 3, totaling 41 entries)