The problem
I was recently reading a nice German book on Effective Software Archictecture by Gernot Starke and stumbled upon a discussion of the dependency inversion principle, which got me thinking. Gernot Starke first discusses the problem with an allusion to traditional procedural programming (translation mine):
Classical designs for procedural languages show a very characteristic structure of dependencies. As (the following figure) illustrates, these dependencies typically start from a central point, for instance the main program or the central module of an application. At this high level, these modules typically implement abstract processes. However, they are directly depending on the implementation of the concrete units, i.e. the very functions. These dependencies are causing big problems in practice. They inhibit changing the implementation of concrete functions without causing impacts on the overall system.
He then goes on to introduce the idea of programming against abstractions and introduces the idea of the dependency inversion principle, first coined in Bob Martin’s DIP article (see also another thorough discussion in Brett Schucherts article on DIP). Basically, the idea is that the integrating process refers only to abstractions (i.e. interfaces) which are then implemented by concrete elements (classes), cf. the next figure.
When I take a look at some of my recent Clojure code or at some older code I’ve written in Common Lisp, I immediately recognize dependencies that correspond to those in a classical procedural system. Let’s go for an example and take a look at one specific function in kata 4, data munging:
(ns kata4-data-munging.core
(:require [kata4-data-munging.parse :refer [parse-day]]
[clojure.java.io :as 'io]))
(defn find-lowest-temperature
"Return day in weatherfile with the smallest temperature spread"
[weatherfile]
(with-open [rdr (io/reader weatherfile)]
(loop [lines (line-seq rdr) minday 0 minspread 0]
(if (empty? lines)
minday
(let [{mnt :MnT mxt :MxT curday :day} (parse-day (first lines)) ;<-- dependency!
curspread (when (and mnt mxt) (- mxt mnt))]
(if (and curday curspread
(or (= minspread 0)
(< curspread minspread)))
(recur (next lines) curday curspread)
(recur (next lines) minday minspread)))))))
The dependency here is on the concrete implementation of parse-day
, you can basically ignore the rest for the argument here. Given that this was a small coding kata, this is not unreasonable (and in the course of the kata, the code changes to be more general), but the issues here are obvious:
- if we would like to parse a weather-file with a different structure, we have to change
find-lowest-temperature
to call out to a different function, - if the result of the new function differs, again we have to change the implementation of
find-lowest-temperature
, - we also have to change the namespace declaration, i.e. we probably want to require a different module.
Clojure’s built-in solutions
The application of the dependency inversion principle is typically shown in the context of object-oriented programming languages, like Java where you use interfaces and classes implementing those interfaces for breaking the dependency on concrete implementations, cf. the figure above again. But as we’ll see the principle can be applied independently of object-orientation. I’ll discuss higher-order functions, protocols and multimethods as potential solutions.
Higher order functions
For starters and probably painfully obvious is to make use of the fact that Clojure treats functions as first-class objects and supports higher-order functions. This simply means that we can pass the parsing function as an argument to find-lowest-temperature
.
(defn find-lowest-temperature
"Return day in weatherfile with the smallest temperature spread"
[weatherfile parsefn] ; <-- function as parameter
(with-open [rdr (io/reader weatherfile)]
(loop [lines (line-seq rdr) minday 0 minspread 0]
(if (empty? lines)
minday
(let [{mnt :MnT mxt :MxT curday :day} (parsefn (first lines))
curspread (when (and mnt mxt) (- mxt mnt))]
(if (and curday curspread
(or (= minspread 0)
(< curspread minspread)))
(recur (next lines) curday curspread)
(recur (next lines) minday minspread)))))))
This way, we can simply call (find-lowest-temperature "myweatherfile" parse-day)
and freely substitute whatever file format and accompanying parse function we need. What does this buy us?
- We no longer have to modify
find-lowest-temperature
when we want to use a differentparse-day
function. - The namespace containing
find-lowest-temperature
also no longer requires the (namespace containing the) parse function.
But there is also a down-side: find-lowest-temperature
assumes that all parsing functions it will get fed adhere to a signature that is entirely implicit: parsefn
needs to take exactly one line and needs to return a map with given key-names. Higher-order functions don’t provide a solution for this per-se, so in order to solve the implicit signature issue we need to look elsewhere. This is nothing Clojure specific: Assuming you’ve passed in an object either as a method parameter or via Setter-Methods or Constructor-Injection (cf. dependency injection), Python’s or Ruby’s duck-typing basically works the same way: the caller of a method simply assumes that the callee offers a method with the right signature. It is the responsibility of the caller (of find-lowest-temperature
) to provide a matching function for parse-fn
.
However, this actually amounts to just move the problem from one level to the next: now some other level has to decide which concrete parse function needs to be used. This next level will have again the exact same problems: it will depend on both concrete implementations of find-lowest-temperature
and parse-day
(or any other parse function). If you think this through, it’s obvious that in general at one point or another, you have code that determines which function to call and which parameters to use. The question is only if we can use abstractions or whether we have to use concrete implementations. We’ll return to this issue, that now at some other level you need to handle the problem, later.
Protocols
The key to solve the signature issue is that a function signature is basically a contract between caller and callee. Clojure provides a way to make such contracts explicit, using so called protocols. A protocol is a named set of named methods and their signatures, defined using defprotocol
. A protocol definition is actually more a specification or a declaration of methods, which needs to be implemented by one of multiple ways (cf. the official Clojure documentation). Let’s take a look at another example, taken from the bloom filter kata. I defined a protocol for how a bloom-filter needs to be implemented, i.e. I declare such an implementation needs at least bloom-bit-get
and a bloom-bit-set
operations:
(defprotocol BloomFilterImpl
(bloom-size [filter])
(bloom-bit-get [filter position])
(bloom-bit-set [filter position value]))
Note that this is just a declaration, not an implementation. I provided three different implementations for bloom-filters, here are two of them. The first one uses reify
to construct an object which provides implementations for the methods of the protocol:
(defn make-bloom-vector [size]
(let [bloom-vect
(ref (into (vector-of :boolean)
(take size (repeatedly #(identity false)))))]
(reify BloomFilterImpl
(bloom-size [_ ]
size)
(bloom-bit-get [_ position]
(nth @bloom-vect position))
(bloom-bit-set [_ position value]
(alter bloom-vect assoc position value)))))
The other implementation extends an existing data type, a java.util.BitSet
to implement the protocol, essentially allowing any bit set to be used as a bloom filter, which is quite an amazing capability (which is also akin to Ruby’s or Python’s monkey patching).
(extend-type BitSet
BloomFilterImpl
(bloom-size [filter]
(.size filter))
(bloom-bit-get [filter position]
(locking filter
(.get filter position)))
(bloom-bit-set [filter position value]
(if (< position (bloom-size filter))
(locking filter
(.set filter position value))
(throw
(IllegalArgumentException.
"position outside of bloom
filter size")))))
And this is the code for using it:
(defn bloom-add [bloom charseq hashfns]
(let [size (bloom-size bloom)]
(doseq [hashval (hash-string charseq :hashfns hashfns)]
(bloom-bit-set bloom
(abs (mod hashval size)) true))
bloom))
(defn bloom-contains? [bloom charseq hashfns]
(let [size (bloom-size bloom)
hashvals (hash-string charseq :hashfns hashfns)]
(every? #(= (bloom-bit-get bloom (abs (mod % size))) true) hashvals)))
Note that these functions are completely independent of the concrete bloom-filter implementation that is used:
bloom-add
andbloom-contains?
refer to the respective methods of theBloomFilterImpl
protocol, not to the concrete methods,- this implies that the module containing
bloom-add
andbloom-contains?
only needs to refer the module containing the protocol definition, not the modules containing the concrete implementation, - the concrete implementation of
bloom-bit-get
andbloom-bit-set
can live in arbitrary modules / namespaces, - the protocol makes the function signature of
bloom-bit-get
andbloom-bit-set
explicit.
TL;DR: Using a protocol is an application of the dependency inversion principle: the dependency is on the abstract protocol only, on which the concrete implementations depend as well.
Protocols in ClojureScript have some differences to their pure Clojure counterparts, check the official page for differences between ClojureScript and Clojure.
Interfaces
Now, if you’ve used Java before, you might notice the similarity to Java’s interfaces and indeed protocols hook into them. From the official documentation for protocols:
defprotocol will automatically generate a corresponding interface, with the same name as the protocol, i.e. given a protocol: my.ns/Protocol, an interface: my.ns.Protocol. The interface will have methods corresponding to the protocol functions, and the protocol will automatically work with instances of the interface.
Clojure also provides definterface
and gen-interface
as a more low-level interface to Java interfaces. They are used similarly to protocols. Note that protocols provide several advantages over plain interfaces, cf. the motivation section on protocols of the official documentation.
Multimethods
But we’re not done yet. Clojure’s multimethods, which are similar to Common Lisp generic functions, can be used to apply the dependency inversion principle, too. A Clojure multimethod is a combination of a dispatching function, and one or more methods, to quote from the original documentation. When a multimethod is called in some code basis, the dispatching function will be called with the arguments to produce a dispatching value, which will then select which concrete method is going to be called. This provides polymorphism over a broad variety of options, dispatching over types, specific values etc. It’s specifically useful to provide dispatch on more than just one argument (as is common with Java or Python, for instance, where only the class of the object you call the method on determines which method will be called).
The key for our purposes is understanding that the declaration of the multimethod via defmulti
and the concrete implementations via defmethod
are only loosely coupled via the dispatching function. I.e., we could just declare the following methods for some dictionary:
(ns dictionary.dict)
(defmulti build-dict
"Build up a dictionary"
(fn [dicttype wordfile capacity]
(dicttype)))
(defmulti add-word
"Add a word to a dictionary"
(fn [dict word]
(class dict)))
(defmulti dict-contains?
"Check if word is in the dictionary"
(fn [dict word]
(class dict)))
This sets up three multimethods for a dictionary which will dispatch on the class of the concrete data type used to implement the dictionary. We could provide an implementation like this, re-using the bloom-filter functions:
(ns dictionary.bloom-dict
(:require [dictionary.dict :refer [build-dict add-word dict-contains?]]
[kata5.bloom-filters.core :refer [bloom-build bloom-add bloom-contains?]])
(defmethod build-dict java.util.BitSet [dictionary wordfile capacity]
(build-bloom wordfile :size (optimal-size capacity 0.1)))
(defmethod add-word java.util.BitSet [dictionary wordfile]
(bloom-add dictionary word))
(defmethod contains? java.util.BitSet [dictionary word]
(bloom-contains? dictionary word))
And of course, we could provide an implementation build upon a different data type, using e.g. a trie instead of a bloom filter. Again, any code that would use a dictionary can just happily use add-word
and contains?
without ever knowing or caring about which data type is used to implement the dictionary and we can happily switch the implementation. But it is worth noting that we’re here not really making good use of multimethods here: we’re basically applying them here for a single argument, class-based dispatch. Hence we would probably be better of using protocols, as this is what protocols are designed for. Multimethods also come with a tiny drawback in comparison: where protocols are declarations only, multimethods are tied to the concrete dispatching function you supply with defmulti
. This implies that you will only ever be able to provide implementations for whatever value your dispatching function can return, which could potentially not be compatible with what you need in the future.
Dependency Injection
So Clojure provides some ways to disentangle functional dependencies using the dependency inversion principle. However, as discussed above, at some point we need to make some concrete choices:
- For higher-order functions, we have to supply the concrete function that will be called.
- For protocols, we have to
reify
a concrete object or to create an object of a concrete type that is extended to adhere to the protocol. In the kata5 code, I supply abuild-bloom
function that either takes an existing object (possibly generated by one of themake-bloom-
functions for reified objects) or creates aBitSet
by default. - For multimethods, we have to supply arguments to the method so that the dispatching function can be used to select the right concrete method. E.g., in the example above, the
build-dict
method expects a type as a first parameter and somewhere in the codebase I need to call it and re-use the resulting dictionary with each call toadd-word
orcontains?
.
This amounts to the fact that we’ve basically shifted the responsibility of making a concrete choice from one point (the function/module which calls the dependency) to another. It’s no wonder then that in object orientated programming, the dependency inversion principle is nearly always discussed along with dependency injection. There are a number of excellent articles on DI in Clojure already:
- The component library by Stuart Sierra
- “A dependency injection pattern in Clojure” by Alex Miller
- “5 faces of dependency injection in Clojure” by Matthew Smith
- “Isolating External Dependencies in Clojure” by Joseph Wilk
Now if you take a look at these articles in order, you’ll notice that the first two are mainly concerned with managing state (objects), e.g. managing database handles, connection pools or caches. There is nothing wrong with that, but it’s not what we’ve been up to here. To emphasize the “nothing wrong” part, it’s quite obvious that for the dictionary example, using the component library to manage object creation and destruction (i.e. the start
/stop
life-cycle discussed by Stuart Sierra) is a very plausible way to go: a dictionary is just a stateful resource that we would like to keep the dictionary around through the application. That’s exactly what the component library (or Alex defrecord
based approach for defining the environmental context) is good for.
However, we could also use another classical approach to solve the dependency injection problem: using a service locator, also called registry (both by Martin Fowler). With the rise of container-based DI solutions, the service locator pattern is certainly no longer en vogue, but for our purposes here we can put it to good use (cf. the discussion of pros and cons of either approach in Martin Fowler’s article linked above). Let’s go back to our original example:
(ns kata4-data-munging.core
(:require [kata4-data-munging.parse :refer [parse-day]]
[clojure.java.io :as 'io]))
(defn find-lowest-temperature
"Return day in weatherfile with the smallest temperature spread"
[weatherfile]
(with-open [rdr (io/reader weatherfile)]
(loop [lines (line-seq rdr) minday 0 minspread 0]
(if (empty? lines)
minday
(let [{mnt :MnT mxt :MxT curday :day} (parse-day (first lines))
curspread (when (and mnt mxt) (- mxt mnt))]
(if (and curday curspread
(or (= minspread 0)
(< curspread minspread)))
(recur (next lines) curday curspread)
(recur (next lines) minday minspread)))))))
We could just as well handle the dependency on parse-day
like outlined below: let’s use a simple map as our service locator. We’ll then implement parse-day
as a (sort-of abstract) function which does nothing else then use this map to look up the real implementation of parse-day
and calls it. Here’s the beef:
(ns kata4-data-munging.interface
(:require [kata4-data-munging.service :refer [\*services\*]]))
(defn parse-day [line]
((:parse-day \*services\*) line)
And then we’ll have the service locator itself, which we could of course extend to include whatever function we might want to “inject”. The module kata4-data-munging.parse
would contain the concrete implementation to use.
(ns kata4-data-munging.service
(:requre [kata4-data-munging.parse :refer [parse-day]]))
(def ^:dynamic \*services\*
{:parse-day parse-day})
This way we can solve our original problem:
find-lowest-temperature
doesn’t depend on the concrete implementation anymore, as we’ve introduced a configuration layer in between- hence the namespace containing
find-lowest-temperature
also doesn’t need to refer the concrete implementation - we can now exchange the implementation of
parse-day
by just changing the configuration of*services*
(and by binding the dynamic variable we can do so also easily for testing purposes, cf. the articles by Joseph Wilks linked above) - all such configurations can be put into a single, central place.
It’s quite clear that this approach also has some drawbacks:
- the idea that
interface/parse-day
is an abstraction is an illusion in reality: it’s just another concrete function that has a direct dependency on the concrete implementation of the service locator - the service locator will have dependencies to each and every service provided (however, this is quite typical, e.g. Guice bindings or the need to specify all mappings from abstractions to implementations in an XML file in some other framework)
- the dependency is resolved at run-time, not at compile time which might trigger unexpected errors
- the signature of the
parse-day
function is still implicit only. However, we could simply extend the approach to using protocols.
Update: Dan Lentz reminded me in a Google+ discussion that there is also the vinyasa library that provides injection capabilities. This tool relies heavily on leinigen, in particular on the :injections
functionality on which there is not much documentation to be found besides a brief explanation in the sample project.clj:
Forms to prepend to every form that is evaluated inside your project. Allows working around the Gilardi Scenario: http://technomancy.us/143
Using lein’s injections feature, we can for instance make the bloom-filter based dictionary implementation available via an injection:
(defproject dictionary "0.1.0-SNAPSHOT"
:dependencies [[org.clojure/clojure "1.6.0"]
[kata5-bloom-filters "0.1.0-SNAPSHOT"]]
:main ^:skip-aot dictionary.core
:target-path "target/%s"
:injections [(require '[dictionary.bloom-dict])]
:profiles {:uberjar {:aot :all}})
Our “main application” then doesn’t have to refer to the concrete implementation at all, because the injection makes the methods from ‘dictionary.bloom-dict’ available.
(ns dictionary.core
(:require [dictionary.dict :refer [build-dict dict-contains?]])
(:gen-class))
(def ^:dynamic *wordfile* "wordlist.txt")
(defn -main
[& args]
(let [dict (build-dict java.util.BitSet *wordfile* 50000)]
(if (dict-contains? dict "Zulu")
(println "Found zulu")
(println "Word missing"))
(println "Good bye!")))
Conclusion
Clojure provides quite a number of possible ways to disentangle dependencies. Besides the ubiquitous higher-order functions, protocols are the premier tool to use. When it comes to injecting dependencies, the Clojure community has come up with a range of solutions, from smaller to bigger ones.
11/16/2015 07:31:37 PM
Fun with function signatures
In a blog post on dependency inversion in Clojure I’ve discussed what this DI principle actually is about and the solutions Clojure offers to support it. There is one aspect that bugged me a little: For me, a fundamental challenge with DI in a langu