Dec 2

The problem

I was recently reading a nice German book on Effective Software Archictecture by Gernot Starke and stumbled upon a discussion of the dependency inversion principle, which got me thinking. Gernot Starke first discusses the problem with an allusion to traditional procedural programming (translation mine):

Classical designs for procedural languages show a very characteristic structure of dependencies. As (the following figure) illustrates, these dependencies typically start from a central point, for instance the main program or the central module of an application. At this high level, these modules typically implement abstract processes. However, they are directly depending on the implementation of the concrete units, i.e. the very functions. These dependencies are causing big problems in practice. They inhibit changing the implementation of concrete functions without causing impacts on the overall system.

Classical dependencies in procedural systems He then goes on to introduce the idea of programming against abstractions and introduces the idea of the dependency inversion principle, first coined in Bob Martin’s DIP article (see also another thorough discussion in Brett Schucherts article on DIP). Basically, the idea is that the integrating process refers only to abstractions (i.e. interfaces) which are then implemented by concrete elements (classes), cf. the next figure.

Integrate with abstractions When I take a look at some of my recent Clojure code or at some older code I’ve written in Common Lisp, I immediately recognize dependencies that correspond to those in a classical procedural system. Let’s go for an example and take a look at one specific function in kata 4, data munging:

(ns kata4-data-munging.core
    (:require [kata4-data-munging.parse :refer [parse-day]]
              [clojure.java.io :as 'io]))

(defn find-lowest-temperature
    "Return day in weatherfile with the smallest temperature spread"
    [weatherfile]
    (with-open [rdr (io/reader weatherfile)]
         (loop [lines (line-seq rdr) minday 0 minspread 0]
        (if (empty? lines)
            minday
            (let [{mnt :MnT mxt :MxT curday :day} (parse-day (first lines)) ;<-- dependency!
              curspread (when (and mnt mxt) (- mxt mnt))]
            (if (and curday curspread
                  (or (= minspread 0)
                  (< curspread minspread)))
               (recur (next lines) curday curspread)
               (recur (next lines) minday minspread)))))))

The dependency here is on the concrete implementation of parse-day, you can basically ignore the rest for the argument here. Given that this was a small coding kata, this is not unreasonable (and in the course of the kata, the code changes to be more general), but the issues here are obvious:

  • if we would like to parse a weather-file with a different structure, we have to change find-lowest-temperature to call out to a different function,
  • if the result of the new function differs, again we have to change the implementation of find-lowest-temperature,
  • we also have to change the namespace declaration, i.e. we probably want to require a different module.

Clojure’s built-in solutions

The application of the dependency inversion principle is typically shown in the context of object-oriented programming languages, like Java where you use interfaces and classes implementing those interfaces for breaking the dependency on concrete implementations, cf. the figure above again. But as we’ll see the principle can be applied independently of object-orientation. I’ll discuss higher-order functions, protocols and multimethods as potential solutions.

Higher order functions

For starters and probably painfully obvious is to make use of the fact that Clojure treats functions as first-class objects and supports higher-order functions. This simply means that we can pass the parsing function as an argument to find-lowest-temperature.

(defn find-lowest-temperature
    "Return day in weatherfile with the smallest temperature spread"
    [weatherfile parsefn] ; <-- function as parameter
    (with-open [rdr (io/reader weatherfile)]
         (loop [lines (line-seq rdr) minday 0 minspread 0]
        (if (empty? lines)
            minday
            (let [{mnt :MnT mxt :MxT curday :day}  (parsefn (first lines))
              curspread (when (and mnt mxt) (- mxt mnt))]
            (if (and curday curspread
                  (or (= minspread 0)
                  (< curspread minspread)))
               (recur (next lines) curday curspread)
               (recur (next lines) minday minspread)))))))

This way, we can simply call (find-lowest-temperature "myweatherfile" parse-day) and freely substitute whatever file format and accompanying parse function we need. What does this buy us?

  • We no longer have to modify find-lowest-temperature when we want to use a different parse-day function.
  • The namespace containing find-lowest-temperature also no longer requires the (namespace containing the) parse function.

But there is also a down-side: find-lowest-temperature assumes that all parsing functions it will get fed adhere to a signature that is entirely implicit: parsefn needs to take exactly one line and needs to return a map with given key-names. Higher-order functions don’t provide a solution for this per-se, so in order to solve the implicit signature issue we need to look elsewhere. This is nothing Clojure specific: Assuming you’ve passed in an object either as a method parameter or via Setter-Methods or Constructor-Injection (cf. dependency injection), Python’s or Ruby’s duck-typing basically works the same way: the caller of a method simply assumes that the callee offers a method with the right signature. It is the responsibility of the caller (of find-lowest-temperature) to provide a matching function for parse-fn.

However, this actually amounts to just move the problem from one level to the next: now some other level has to decide which concrete parse function needs to be used. This next level will have again the exact same problems: it will depend on both concrete implementations of find-lowest-temperature and parse-day (or any other parse function). If you think this through, it’s obvious that in general at one point or another, you have code that determines which function to call and which parameters to use. The question is only if we can use abstractions or whether we have to use concrete implementations. We’ll return to this issue, that now at some other level you need to handle the problem, later.

Protocols

The key to solve the signature issue is that a function signature is basically a contract between caller and callee. Clojure provides a way to make such contracts explicit, using so called protocols. A protocol is a named set of named methods and their signatures, defined using defprotocol. A protocol definition is actually more a specification or a declaration of methods, which needs to be implemented by one of multiple ways (cf. the official Clojure documentation). Let’s take a look at another example, taken from the bloom filter kata. I defined a protocol for how a bloom-filter needs to be implemented, i.e. I declare such an implementation needs at least bloom-bit-get and a bloom-bit-set operations:

(defprotocol BloomFilterImpl
    (bloom-size [filter])
    (bloom-bit-get [filter position])
    (bloom-bit-set [filter position value]))

Note that this is just a declaration, not an implementation. I provided three different implementations for bloom-filters, here are two of them. The first one uses reify to construct an object which provides implementations for the methods of the protocol:

(defn make-bloom-vector [size]
    (let [bloom-vect
          (ref (into (vector-of :boolean)
             (take size (repeatedly #(identity false)))))]
       (reify BloomFilterImpl
           (bloom-size [_ ]
               size)
           (bloom-bit-get [_ position]
               (nth @bloom-vect position))
           (bloom-bit-set [_ position value]
               (alter bloom-vect assoc position value)))))

The other implementation extends an existing data type, a java.util.BitSet to implement the protocol, essentially allowing any bit set to be used as a bloom filter, which is quite an amazing capability (which is also akin to Ruby’s or Python’s monkey patching).

(extend-type BitSet
    BloomFilterImpl
        (bloom-size [filter]
            (.size filter))
        (bloom-bit-get [filter position]
            (locking filter
              (.get filter position)))
        (bloom-bit-set [filter position value]
            (if (< position (bloom-size filter))
               (locking filter 
                  (.set filter position value))
               (throw 
                 (IllegalArgumentException. 
                    "position outside of bloom 
                filter size")))))

And this is the code for using it:

(defn bloom-add [bloom charseq hashfns]
    (let [size (bloom-size bloom)]
        (doseq [hashval (hash-string charseq :hashfns hashfns)]
            (bloom-bit-set bloom 
                 (abs (mod hashval size)) true))
        bloom))

(defn bloom-contains? [bloom charseq hashfns]
    (let [size (bloom-size bloom)
        hashvals (hash-string charseq :hashfns hashfns)]
         (every? #(= (bloom-bit-get bloom (abs (mod % size))) true)  hashvals)))

Note that these functions are completely independent of the concrete bloom-filter implementation that is used:

  • bloom-add and bloom-contains? refer to the respective methods of the BloomFilterImpl protocol, not to the concrete methods,
  • this implies that the module containing bloom-add and bloom-contains? only needs to refer the module containing the protocol definition, not the modules containing the concrete implementation,
  • the concrete implementation of bloom-bit-get and bloom-bit-set can live in arbitrary modules / namespaces,
  • the protocol makes the function signature of bloom-bit-get and bloom-bit-set explicit.

TL;DR: Using a protocol is an application of the dependency inversion principle: the dependency is on the abstract protocol only, on which the concrete implementations depend as well.

Protocols in ClojureScript have some differences to their pure Clojure counterparts, check the official page for differences between ClojureScript and Clojure.

Interfaces

Now, if you’ve used Java before, you might notice the similarity to Java’s interfaces and indeed protocols hook into them. From the official documentation for protocols:

defprotocol will automatically generate a corresponding interface, with the same name as the protocol, i.e. given a protocol: my.ns/Protocol, an interface: my.ns.Protocol. The interface will have methods corresponding to the protocol functions, and the protocol will automatically work with instances of the interface.

Clojure also provides definterface and gen-interface as a more low-level interface to Java interfaces. They are used similarly to protocols. Note that protocols provide several advantages over plain interfaces, cf. the motivation section on protocols of the official documentation.

Multimethods

But we’re not done yet. Clojure’s multimethods, which are similar to Common Lisp generic functions, can be used to apply the dependency inversion principle, too. A Clojure multimethod is a combination of a dispatching function, and one or more methods, to quote from the original documentation. When a multimethod is called in some code basis, the dispatching function will be called with the arguments to produce a dispatching value, which will then select which concrete method is going to be called. This provides polymorphism over a broad variety of options, dispatching over types, specific values etc. It’s specifically useful to provide dispatch on more than just one argument (as is common with Java or Python, for instance, where only the class of the object you call the method on determines which method will be called).

The key for our purposes is understanding that the declaration of the multimethod via defmulti and the concrete implementations via defmethod are only loosely coupled via the dispatching function. I.e., we could just declare the following methods for some dictionary:

(ns dictionary.dict)

(defmulti build-dict 
     "Build up a dictionary"
    (fn [dicttype wordfile capacity]
        (dicttype)))

(defmulti add-word
    "Add a word to a dictionary"
    (fn [dict word]
         (class dict)))

(defmulti dict-contains?
    "Check if word is in the dictionary"
    (fn [dict word]
        (class dict)))

This sets up three multimethods for a dictionary which will dispatch on the class of the concrete data type used to implement the dictionary. We could provide an implementation like this, re-using the bloom-filter functions:

 (ns dictionary.bloom-dict
    (:require [dictionary.dict :refer [build-dict add-word dict-contains?]]
                        [kata5.bloom-filters.core :refer [bloom-build bloom-add bloom-contains?]])

(defmethod build-dict java.util.BitSet [dictionary wordfile capacity]
    (build-bloom wordfile :size (optimal-size capacity 0.1)))

(defmethod add-word java.util.BitSet [dictionary wordfile]
    (bloom-add dictionary word))

(defmethod contains? java.util.BitSet [dictionary word]
    (bloom-contains? dictionary word))

And of course, we could provide an implementation build upon a different data type, using e.g. a trie instead of a bloom filter. Again, any code that would use a dictionary can just happily use add-word and contains? without ever knowing or caring about which data type is used to implement the dictionary and we can happily switch the implementation. But it is worth noting that we’re here not really making good use of multimethods here: we’re basically applying them here for a single argument, class-based dispatch. Hence we would probably be better of using protocols, as this is what protocols are designed for. Multimethods also come with a tiny drawback in comparison: where protocols are declarations only, multimethods are tied to the concrete dispatching function you supply with defmulti. This implies that you will only ever be able to provide implementations for whatever value your dispatching function can return, which could potentially not be compatible with what you need in the future.

Dependency Injection

So Clojure provides some ways to disentangle functional dependencies using the dependency inversion principle. However, as discussed above, at some point we need to make some concrete choices:

  • For higher-order functions, we have to supply the concrete function that will be called.
  • For protocols, we have to reify a concrete object or to create an object of a concrete type that is extended to adhere to the protocol. In the kata5 code, I supply a build-bloom function that either takes an existing object (possibly generated by one of the make-bloom- functions for reified objects) or creates a BitSet by default.
  • For multimethods, we have to supply arguments to the method so that the dispatching function can be used to select the right concrete method. E.g., in the example above, the build-dict method expects a type as a first parameter and somewhere in the codebase I need to call it and re-use the resulting dictionary with each call to add-word or contains?.

This amounts to the fact that we’ve basically shifted the responsibility of making a concrete choice from one point (the function/module which calls the dependency) to another. It’s no wonder then that in object orientated programming, the dependency inversion principle is nearly always discussed along with dependency injection. There are a number of excellent articles on DI in Clojure already:

Now if you take a look at these articles in order, you’ll notice that the first two are mainly concerned with managing state (objects), e.g. managing database handles, connection pools or caches. There is nothing wrong with that, but it’s not what we’ve been up to here. To emphasize the “nothing wrong” part, it’s quite obvious that for the dictionary example, using the component library to manage object creation and destruction (i.e. the start/stop life-cycle discussed by Stuart Sierra) is a very plausible way to go: a dictionary is just a stateful resource that we would like to keep the dictionary around through the application. That’s exactly what the component library (or Alex defrecord based approach for defining the environmental context) is good for.

However, we could also use another classical approach to solve the dependency injection problem: using a service locator, also called registry (both by Martin Fowler). With the rise of container-based DI solutions, the service locator pattern is certainly no longer en vogue, but for our purposes here we can put it to good use (cf. the discussion of pros and cons of either approach in Martin Fowler’s article linked above). Let’s go back to our original example:

(ns kata4-data-munging.core
    (:require [kata4-data-munging.parse :refer [parse-day]]
                        [clojure.java.io :as 'io]))

(defn find-lowest-temperature
     "Return day in weatherfile with the smallest temperature spread"
    [weatherfile]
    (with-open [rdr (io/reader weatherfile)]
          (loop [lines (line-seq rdr) minday 0 minspread 0]
        (if (empty? lines)
             minday
            (let [{mnt :MnT mxt :MxT curday :day}  (parse-day (first lines))
              curspread (when (and mnt mxt) (- mxt mnt))]
            (if (and curday curspread
                   (or (= minspread 0)
                  (< curspread minspread)))
               (recur (next lines) curday curspread)
               (recur (next lines) minday minspread)))))))

We could just as well handle the dependency on parse-day like outlined below: let’s use a simple map as our service locator. We’ll then implement parse-day as a (sort-of abstract) function which does nothing else then use this map to look up the real implementation of parse-day and calls it. Here’s the beef:

(ns kata4-data-munging.interface
    (:require [kata4-data-munging.service :refer [\*services\*]]))

(defn parse-day [line]
    ((:parse-day \*services\*) line)

And then we’ll have the service locator itself, which we could of course extend to include whatever function we might want to “inject”. The module kata4-data-munging.parse would contain the concrete implementation to use.

 (ns kata4-data-munging.service
    (:requre [kata4-data-munging.parse :refer [parse-day]]))

 (def ^:dynamic  \*services\*
    {:parse-day parse-day})

This way we can solve our original problem:

  • find-lowest-temperature doesn’t depend on the concrete implementation anymore, as we’ve introduced a configuration layer in between
  • hence the namespace containing find-lowest-temperature also doesn’t need to refer the concrete implementation
  • we can now exchange the implementation of parse-day by just changing the configuration of *services* (and by binding the dynamic variable we can do so also easily for testing purposes, cf. the articles by Joseph Wilks linked above)
  • all such configurations can be put into a single, central place.

It’s quite clear that this approach also has some drawbacks:

  • the idea that interface/parse-day is an abstraction is an illusion in reality: it’s just another concrete function that has a direct dependency on the concrete implementation of the service locator
  • the service locator will have dependencies to each and every service provided (however, this is quite typical, e.g. Guice bindings or the need to specify all mappings from abstractions to implementations in an XML file in some other framework)
  • the dependency is resolved at run-time, not at compile time which might trigger unexpected errors
  • the signature of the parse-day function is still implicit only. However, we could simply extend the approach to using protocols.

Update: Dan Lentz reminded me in a Google+ discussion that there is also the vinyasa library that provides injection capabilities. This tool relies heavily on leinigen, in particular on the :injections functionality on which there is not much documentation to be found besides a brief explanation in the sample project.clj:

Forms to prepend to every form that is evaluated inside your project. Allows working around the Gilardi Scenario: http://technomancy.us/143

Using lein’s injections feature, we can for instance make the bloom-filter based dictionary implementation available via an injection:

(defproject dictionary "0.1.0-SNAPSHOT"
   :dependencies [[org.clojure/clojure "1.6.0"]
                               [kata5-bloom-filters "0.1.0-SNAPSHOT"]]
   :main ^:skip-aot dictionary.core
   :target-path "target/%s"
   :injections [(require '[dictionary.bloom-dict])]
   :profiles {:uberjar {:aot :all}})

Our “main application” then doesn’t have to refer to the concrete implementation at all, because the injection makes the methods from ‘dictionary.bloom-dict’ available.

(ns dictionary.core
   (:require [dictionary.dict :refer [build-dict dict-contains?]])
   (:gen-class))

(def ^:dynamic *wordfile* "wordlist.txt")

(defn -main
  [& args]
  (let [dict (build-dict java.util.BitSet *wordfile* 50000)]
     (if (dict-contains? dict "Zulu")
         (println "Found zulu")
         (println "Word missing"))
  (println "Good bye!")))

Conclusion

Clojure provides quite a number of possible ways to disentangle dependencies. Besides the ubiquitous higher-order functions, protocols are the premier tool to use. When it comes to injecting dependencies, the Clojure community has come up with a range of solutions, from smaller to bigger ones.

Posted by Holger Schauer

Defined tags for this entry:

1 Trackbacks

  1. Not lost but found

    Fun with function signatures
    In a blog post on dependency inversion in Clojure I’ve discussed what this DI principle actually is about and the solutions Clojure offers to support it. There is one aspect that bugged me a little: For me, a fundamental challenge with DI in a langu

5 Comments

Display comments as(Linear | Threaded)
  1. Marton says:

    Dear Holger,

    A quite thorough article with many links! I’ve enjoyed it much, thank you!

    I’d like to comment on the first di solution - higher order functions since it is the simpliest most powerful way to achive di without any fancy framework/language feature/half measure in my opinion. To be honest I have not much experience with the others so cannot comment on this in comparison just in itself. I used this technic as a sole tool of DI in my latest project with overwhelming success.

    The idea is not use simply longer parameter list with the function dependencies but use let-over lambda and split the parameter list to a first portion which is about the dependencies and a second portion which is about real parameters for work. So you will have factory functions basically which cannot be used as is so you have to have a distinct(!) layer in your code where you factor the functionality you want to achive. That is the place where all the dependencies will be managed, where DI kicks in. This is waht we do in xml/annotations if we use spring DI in java…and all other code is untouched, clean, simple, unittestable.

    Some comments on the downsides you mentioned: "implicit signature": I do not think this is a problem of the DI approach. It is more an aspect of a basic clojure property: it is a dynamic language. I used the above approach for DI in scala in my latest projects and there you have the helping hand of compile time types thus this issue simply not exists. But it is not only about function signature and type safety but about concrete functionality as well! Furthermore I do not think this is a problem at all since not the caller has to provide the correct functionality but the factory code - which has this as a sole responsibility.

    "this actually amounts to just move the problem from one level to the next": I think it is selfexplanatory that this way you will have clean let-over-lambda code which does not care about implementation, decoupled testable etc. AND you will end up a code which is strictly and ONLY about concrete implementations and injections of dependencies. There you will have your proper level where you can make the wirings.

    I hope it make some sense :) I only felt to reply on this one since DI is one of the most important aspects of SE in my opinion and I searched long until I found this true and effective DI approach based on pure FP.

    If you want I would be happy to rewrite your examples that way.

    Best Regards, Marton

  2. Holger Schauer says:

    Marton, I’m aware of Doug Hoyte’s book Let over lambda, but I’m not sure what you are actually referring to when you say "let over lambda approach". I guess I understand roughly what you’re after, but feel free to add an example (you can use markdown syntax in the comments).

    I agree that the implicit function signature is due to the dynamic nature of Clojure, but it’s still an important aspect to be aware of when using higher-order functions: signature mismatch is a common source of errors and one which in Clojure will only show up at run-time (in contrast to, say, Scala, as you discussed). It wasn’t meant as an argument against using higher-order functions, though.

  3. Marton says:

    Here’s a little working example.

    (defn- find-lowest-temperature
      &quot;Return day in weatherfile with the smallest temperature spread&quot;
      [parse-day] ; --- dependencies, factory part (let)
      (fn [weatherfile] ; --- the function instance (lambda)
        (with-open [rdr (io/reader weatherfile)]
          (loop [lines (line-seq rdr)
                 minday 0
                 minspread 0]
            (if (empty? lines)
              minday
              (let [{mnt :MnT mxt :MxT curday :day} (parse-day (first lines))
                    curspread (when (and mnt mxt) (- mxt mnt))]
                (if (and curday curspread
                         (or (= minspread 0)
                             (&lt; curspread minspread)))
                  (recur (next lines) curday curspread)
                  (recur (next lines) minday minspread))))))))
    
    (defn- high-level-function-where-we-use-find-lowest
      [find-lowest-temperature] ; --- dependencies, factory part
      (fn [weatherfiles] ; --- the function instance
        (map find-lowest-temperature weatherfiles)))
    
    (defn- parse-day [line]
      (let [splitted (str/split line #&quot;\s+&quot;)
            [min max curday] splitted]
        {:MnT (Double. min)
         :MxT (Double. max)
         :day curday}))
    
    (def find-lowests ; factory
      (-&gt; parse-day ; --- this is a threading macro
        find-lowest-temperature ; --- concrete
        high-level-function-where-we-use-find-lowest)) ; --- concrete
    

    I can’t help it for me signature mismatch is simply type mismatch :) I agree on that this is a problem but also a huge degree of freedom. On the other hand if we want to express DI in clojure itself (as in this example) then some typed features would be helpful (the problem what this factory codes should solve is much better suited for a stat typed language)

    Protocols and multimethods are some exotic and thus interesting featrues but I am not sure these are suitable to be used at di. These are typed concepts under the hood and I would be afraid to pull these into a nice pure FP landscape. We all have some (bad) insticts as we see type hierarchies and code together…we’ve all been in OO zombieland :) BTW the one thing I love in clojure that it is not OO! It is so great! You have to be very disciplined in Scala for example not to end up with resurrected OO zombies… As for dynamic binding: looks cool but to be honest I would be afraid to use this everywhere…dinamic scope is certainly not something is good for makeing more understandable code.

    What do you think? Did something worked out for you very good for DI lately?

  4. Holger Schauer says:

    Thanks, Marton, for the example. That’s a very interesting approach. I think I’ve seen it before, although it’s really not an obvious way to go about it. I apologize for the problems with the HTML entities — apparently that’s a problem with the Markdown plugin of S9Y, I wasn’t able to work around it.

    > I can’t help it for me signature mismatch is simply type mismatch

    It goes beyond type mismatch, it’s also argument number mismatch. Maybe it’s just me, but that’s a problem I really run into when refactoring some code, changing the number of arguments, either intentionally or only be silly mistake.

    > Protocols and multimethods are some exotic and thus interesting featrues

    Well, protocols are not much more than interfaces, I wouldn’t really see them as exotic. Multi-methods / multiple dispatch have been used a lot in Common Lisp.

    > These are typed concepts under the hood and I would be afraid to pull these into a nice pure FP landscape. We all have some (bad) insticts as we see type hierarchies and code together…we’ve all been in OO zombieland

    I have no issue with OO. Clojure is not a pure functional programming language and I’m very happy that it allows freedom to use different paradigms where fit. To me, neither FP or OOP are silver bullets. It’s good to be aware of the available options and of the associated risks.

    This entire article was the result of exploring and evaluating the options in Clojure. This doesn’t imply that I would use DI method a lot — any DI mechanism adds complexity and I prefer simplicity over additional complexity. Any use of a higher-order function or protocol is (or should be) a clear indication that at this point flexibility (to use a different parse-day function) is a valuable goal. There is a fine line between striving for simple building blocks and over-engineering for flexibility.

  5. Holger Schauer says:

    Martin Fowler has recently published a nice article on refactoring module dependencies which discusses many aspects I touched on, using Java and JavaScript (the latter without using classes). I find it very interesting that he discusses the service locator pattern, too, as an important tool for decoupling dependencies, but even more interesting is that he applies for the discussion of dependency injection the very same technique of partial function application that Marton outlined in the comments. Cool stuff.

Add Comment


Markdown format allowed
Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA