Fixing broken boot from encrypted devices after upgrade to Debian Bookworm

Posted by Holger Schauer on Sunday, October 15. 2023

Fixing broken boot from encrypted devices after upgrade to Debian Bookworm

My workstation uses LUKS to encrypt the underlying filesystem on two SSDs (NVMe) devices, atop of which I have lvm2 running to create a big filesystem, which worked nicely the last couple of years (more details will follow below). But after the upgrade from Debian 11 (Bullseye) to Debian 12 (Bookworm) my system refused to boot.

More concretely, I was greeted like always with the prompt to decrypt my root device, which was also working as expected — i.e. I received the “setup device successfully” message from the initramfs. However, the boot failed to mount the resume / swap device and then gave up completely with the information that it couldn’t mount the root device and the lovely message:

ALERT! /dev/mapper/pixie--vg-root does not exist.

Debug attempts

Initramfs debug shell

Searching the web let to a relevant looking thread on forums.debian.net, which gave some initial ideas. In particular, I could make some progress using these two commands:

  vgchange -ay
  vgchange -ay --activationmode partial

Afterwards, my /dev/mapper/ showed the expected devices. However, I couldn’t mount them, hence also couldn’t follow the further advice there.

What is clear to me now but wasn’t at the time is that I should have exited the initramfs emergency shell (via exit) and tried to continue to boot the system. This likely would have worked and allowed me to debug and fix the issue from the machine directly.

Rescue mode from USB stick

Instead, I created a USB stick with the netiso installation image for Debian 12 and booted from the stick into rescue mode (in the advanced options). This nicely detected my hardware, and then prompted me for each of my SSDs to decrypt them.

That’s when I figured out that of course both of my NVMe devices were encrypted separately and not even using the same password. I had forgotten this, but my password store contained of course both passwords, so I could decrypt both devices.

Fixing the problem

Wrong device name

I had read in a reddit thread that Debian uses a modified crypttab(5) format, adding an option initramfs to identify the root device, resume devices and others which needed handling in the initramfs phase of the boot. This option was missing, so I added this to the entries in my crypttab and tried to update the initramfs via update-initramfs -u.

Unfortunetely, this threw an error about a source mismatch for one of my devices and also gave a warning that the cipher method for decryption could not be figured out. After some while, I realized that my /etc/crypttab/ specified nvme0n1p3_crypt as the device but that it should really be nvme1n1p3_crypt (note nvme0 vs. nvme1). I’m not sure if this error had been in there all the time (and if so, why did it work before) or if something changed with Bookworm, triggering a device name change.

After correcting this error, I could finally update the initramfs and rebooted. Unfortunately, I ran into the same problem on boot.

Key file not accessible

This time while debugging in the initramfs emergency shell (and also searching for hints on the web), it became clear to me that my other device nvme0n1p1 was the actual culprit. In particular, I was using a key file (stored in /etc/keys/nvme0n1p1.luks) to decrypt this device automatically, which is also the reason I had forgotten about the two differing passwords for the devices.

So, I edited the crypttab once more to no longer reference the keyfile. So, I changed /etc/crypttab from:

nvme1n1p3_crypt UUID=[elided] none luks,initramfs
nvme0n1p1_crypt UUID=[elided] /etc/keys/nvme0n1p1.luks luks,initramfs

to the new version:

nvme1n1p3_crypt UUID=[elided] none luks,initramfs
nvme0n1p1_crypt UUID=[elided] none luks,initramfs

After another update-initramfs -u -k all, I could finally again boot into my system. Of course, I now have to give both passwords on boot.

I’m not sure if this is a recent change, but apparently now you need to specify a keyfile pattern in /etc/cryptsetup-initramfs/conf-hook. So, I now have the following in this file:

root@pixie:/etc/cryptsetup-initramfs# grep ^KEYFILE_PATTERN conf-hook
KEYFILE_PATTERN=/etc/keys/*

And then reverting the change to the crypttab above, re-running the update-initramfs makes the automatic decryption for the second device work again.

Make it so! Decision making in software architecture

Posted by Holger Schauer on Tuesday, April 13. 2021

I find it interesting in discussions with developers that many have a bad picture of the job of a software architect. When thinking back to my first encounters with a software architect, who was performing his document reviewing job in a dedicated architecture department, I can totally relate, but then again it is 2021 and things should have changed [1]. After all, most developers have a very keen interest that the architecture of the system they are working on is good, but still some developers do not seem to be aware how much their work is impacting the architecture (and vice versa). “Every interesting software-intensive system has an architecture. While some of these architectures are intentional, most appear to be accidental”, as stated in Grady Booch’s article on “Accidental Architecture”. Some people now seem to believe that accidental architecture is the new norm, as “the best architecture, requirements and design emerge from self-organizing teams” (from the principle in the Agile Manifesto). In contrast, in organizations which have dedicated software architects, some people seem to think that it is the architect’s responsibility to ensure that there is a well informed design and to make the hard decisions as necessary. These hard decisions would probably be about “the important stuff (whatever that is)” that is called out in Martin Fowler’s article “Who needs an architect”.

But what does that mean for the daily work, should developers send all questions to the architect for her to make decisions and should architects sit in their ivory tower pondering them and ordering decisions by importance? Not in my view of the world. I’m a big fan of the role model that Fowler describes as Architectus Oryzus or acts as primus inter pares in Georg Hohpe’s article “Agile and Architecture: Friend, not Foe” in which the architect is closely working with the developers and who tries to identify important topics and make sure they are getting addressed at the right time.

Sketch image showing "A Design thinking Workshop", by Jose Berengueres, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons. — A Design thinking Workshop
Image by Jose Berengueres, CC BY-SA 4.0, via Wikimedia Commons.

Stefan Toth has a fantastic (German only) book on agile processes for software archictecture Vorgehensmuster für Software Architektur which describes patterns how to make this happen in daily practice. The core point of the book is that architecture should be a shared responsibility. Everybody should try to raise and address important concerns — whether there is a dedicated architect role or not is a separate issue. Toth lists patterns like identifying quality scenarios, listing architecture topics in the backlog, ad-hoc architecture meetings, common decision making and testing for architectural concerns (and many more) many of which are very team centric. The book is so great because it shows that it is easy to make architecture work a natural aspect of the development workflow and not some high ceremony.

In my experience, technical discussions are often concerned with architectural topics, so you might not even need to bring people together for formal architecture meetings — nevertheless technical design review meetings can be beneficial if they involve people that can and do share new perspectives but who would otherwise not be working so closely with the team. Similarly, if your team already adds technical debt and other non-functional items into the backlog, you probably are already following two of the recommended practices.

The role of the architect is then not so much the one of a decision maker but more the one of the person taking responsibility that architecture aspects are considered by the team. In bigger organizations this implies that the architect should be working closely with the development team but not only. While ideally the developers are also closely collaborating with any other important stakeholder (product owner, devops / operations, other dev teams), it is quite vital for an architect to understand the different (and potentially conflicting) quality needs of these parties. Again, probably even better than having e.g. 1:1 meetings between architect and security officer, so that the architect can point out operational constraints to the development team, would be if the architect tries to make sure that a security expert is involved in design and requirement discussions.

Hippo with tongue stuck out (via Pixabay) — Highest paid person’s opinion.
Image taken from Pixabay

However, the job of an architect is not only consisting of organizing meetings with the right people. In the end, intentional architecture is in need of decisions. “Common decision making”, as listed above, means two things to me: first, people who are affected by a decision (e.g. the developer who has to implement the result) should be directly involved in the decision making process. In teams that do not have a dedicated architect this seems naturally to be the case, but it is worth pointing out that this can easily degrade to single programmer decisions. The other anti-pattern here would be that the architect, likely to be the highest paid person in the room, makes a HiPPO decision or specifies a design while at the same time not being close enough to the problem and / or the existing code base. [2] The effect here can be devastating if the decision turns out to be wrong: not only will people think that the architect did not listen to the team but also that the architect might not have the necessary competence. This can result in serious psychological safety and trust issues. A good working approach to avoid such problems is running small experiments (spikes) to validate options.

The second aspect is that a decision is made in the first place, which is even more important. The idea of decision in the last responsible moment notwithstanding, I find it surprising how hard it can be for organizations / a group of people to take decisions. It is important for an architect to be able to guide a group towards making a decision but also to be comfortable to make a decision themself, if the group cannot find an agreement. The biggest issue here are the trade-offs of potential solutions: e.g., one solution might have benefit A (e.g. scalability) while being weaker in B (e.g. ease of use) and another solution has opposite characteristics. Often different people will have different opinions whether A or B is more important in the selection process. It is very helpful for teams to come to an understanding (not agreement) about these different weighs that people give to the characteristics and then hopefully, the architect can explain why she thinks that one argument is more important than the other. Coming to a decision is more important than convincing everybody, though: aim for solutions that everybody in the team can agree with, even if they would still prefer a different one. Also, it is a good idea to document decisions in a light weight fashion (e.g. in a small Architecture Decision Record).

In case that during implementation the developers run into obstacles, it is quite likely that the developers will actually figure out a different solution than the one decided. This is another case where the “architect as lonely decision maker” can lead to a lot of unnecessary friction. In practice, the choice of a different approach is not a problem as long as it is handled in a similar responsible way, where the developers take a look at the implications and trade-offs of their choice and involve others (and, please, update that ADR).

Footnotes:

[1] I am obviously not talking about so-called Enterprise Architects who are mostly concerned with governing how many systems at a company interact to provide the most business value. Instead the focus here is on software architecture for a single system (which, of course, is likely to interact with many others).

[2] Sometimes the opposite scenario might be true: the team is consisting mostly of people which do not have enough expertise, whereas the architect has long experience with the code base. In this situation, the architect can be helping the team by sketching a design approach and sharing his knowledge. Still, there should be room for questions and further ideas from the team.

Color is the key -- configuring my Dygma Raise keyboard

Posted by Holger Schauer on Saturday, January 30. 2021

In 2020, I finally came to the point where I was ready to invest some money into my health. So I bought a standing desk and a new keyboard, a Dygma Raise, which is an ergonomic split keyboard with mechanical keys. The main reason to buy the Dygma instead of some other ergonomic or mechanical keyboard was that I wanted to be able to keep my arms straightly positioned when using the keyboard — with a regular keyboard I always need to move my arms a little bit inwardly in front of my chest, which leads to a body position where I’m not keeping my back straight which contributes to back pain. As you can see from the image below, I move the two parts considerably apart from each other, to the extent the rather short connection cables from the keyboard to the so-called Neuron allow it. Fortunately these are just USB-C to USB-C cables which means I can just buy two longer cables when I want to (didn’t get the round tuit yet).

Picture of my dygma raise

Configuring the Dygma Raise

The Dygma Raise is a highly configurable keyboard. Not only can you order various layouts (ANSI, ISO) or colors, you can also order different key switches and also change keys as you see fit — yes, I’m talking about the hardware keys here. The my raise page gives a nice overview. I ordered a German layout and Cherry Brown keys (cf. the nice switch overview), which gives a nice tactile feedback without being too loud. I did not fiddle with the keys themselves so far but would like to talk about the various key layers and their configuration via Bazecor, which works on Linux, Mac and Windows. I only tried to use Bazecor on Linux (Debian 10), where they provide an AppImage which so far has been working good (minor point: the user has to be in the dialout group, otherwise you need to start the application as root).

Initial impressions

My initial impression, coming from a pretty regular Cherry G86, was: “Great haptic feeling, but oh my, this will take time to get used to using it.” During my initial attempts to use the keyboard I recognized how often I apparently had been looking at the keyboard before: when I moved the two halves apart and not look on the keyboard, I would utterly mistype. I overcame this by really practizing to type blindly with the two parts put together, so that the typing was more akin to what I was used to. Over time, I got better at this and can now type well without having to look at the split halves.

The biggest other problem for me was that the default key layout has no cursor keys on Layer-0 at all, the cursors are on Layer 1 and keys W,A,S,D. To get to Layer-1, I could either use the transinient Shift to Layer-1 key (the left middle lower thumb key) or the persistent Lock to Layer-1 (the right middle lower thumb key). So, what was always a single keystroke was now requiring two key presses in a totally different area of the keyboard. Add to this the use of all sorts of modifiers that I’m constantly using, e.g. C-M-left to switch screens and now I have to press another key to switch the layer? Not nice. I similar missed Page-Up and Page-Down, which were not even configured for Layer-1. And then, what do I use instead of the media keys I used on the Cherry? That should be easy to configure, right.

Goals

It became very clear very quickly to me that I have to adjust my typing behavior. However, I use currently two machines in parallel and I only have one Dygma Raise. Obviously, I need to configure replacements for at least some of the keys that are missing in comparison to my old keyboard.

The other thought was that I use a lot of key combinations in various programs I use and here I have ten layers to configure, so surely I should be able to ease some of my typing? I.e., in Emacs instead of htting C-M-SPC C-w to kill the current S-expression, it would be nice to just hit a shorter key combination.

And finally, the Dygma Raise has colors like a rainbow, so …

keep layer 0 configuration close to a normal PC-105 layout
use layer 1 for missing keys, especially for the media play/stop key
use layer 1 and others for shortcuts / special needs
setup consistent and meaningful color usage

After some time, I settled to use three layers like this:

Layer 0

As stated in the goals, layer 0 is setup to be pretty normal, see below. E.g, I set all four “space” keys to yield space, so this is like a single big space key. Also, different from the pains that is described in Kari Martilla’s blog about his dygma raise, there are no changes to the parenthesis whatsoever, although they are equally problematically placed on the German layout as on the Nordic layout.

To address the missing cursors, I configured the lower row of thumb keys for the cursors which makes their use very easy, as I only need to hit a single key that is also very easy to reach. For switching virtual desktops in LXDE/Openbox, I use C-left or C-right and this is even easier now, as I can now move to right with Right-Control, too, where before my right hand would have to wander off to the cursor block.

Layer 0 configuration

One key difference is the Escape key: on a PC-105 with German layout that is where you would find the caret (^), but I decided to keep the Escape key bound to it. The caret is to be found on layer 1. And then there are the Dygma and the FN key(caps): they are bound to Shift-to layer 1 and 2, respectively.

Note also the color usage which clearly demarks group boundaries, e.g. between the space and alt keys.

Layer 1

Layer 1 mostly holds the keys which I need often but for which there is no room on Layer 0. I.e., the caret is on the Escape key and the function keys (which I actually only very rarely use) are on the number keys. For the remaining movement type keys, my line of thought went like this: Q and A go to Home and End, W and S to Page Up and Down, because that positioning resembles the key ordering of these keys on a regular PC104/105 layout. Delete and Insert, however, are needed more often and hence can be reached via the two lower left thumb keys.

Layer 1 configuration

I set the media keys (next, previous, louder, quieter) to N,P,+,-, respectively, which makes it easy to memoize. The start/stop toggle is bound to FN, so that (on layer 0) I can simply hit the dygma+FN combination.

You can also see the Menu, Led cycle and Move to Layer 1 are configured on this layer but I don’t use these keys much. I’m currently experimenting with putting the parenthesis additionally on layer-1 on K,L,Ö,Ä to see if that makes hitting the parenthesis [,],(,) that I need for Clojure programming most often better than on the regular keys, but I’m not sure yet how good that will work.

A lot of keys are configured to produce nothing, grouped in the default color of the layer, whereas the configured keys pick up the colors from layer 0 plus the dark blue for the media keys.

Layer 2

With Bazecor versions beyond 0.22, it is now also possible to configure macros, i.e. a sequence of keys. Initially that wasn’t working for me: you have to upgrade the firmware of the Dygma to enable it. That’s the main purpose I started to configure a layer 2 for. Currently I have configured only two such macros: Q (i.e. Fn-Q when I’m in layer 0) triggers a sequence of keys that, when pressed in a lisp mode in Emacs, will select the current S-expression and indent it. It is bound to Q as this is somewhat similar to hitting M-Q will do (formatting a paragraph). W will trigger a key sequence that in Emacs will select the current S-expression and kill (cut) it, similar to C-w.

After I’ve set this up, I thought about doing this with a regular Emacs keyboard macro or Elisp function instead and then invoke these with the keys. This would offer two benefits: I could assign this just to the modes there this combination makes sense and I have the nice side-effect that invoking these functions can be done independent of the Dygma being in use. As it is, if I now hit Fn-Q while not being in Emacs, the application in use will receive the defined key macro sequence and who knows what happens then. Then again, the entire point is to make use of the Dygma to shorten the amount of keys to be pressed.

I also set up ‘-’ to generate C-_, which will trigger an undo in Emacs, thereby on my German keyboard saving me one key stroke (Fn-<-> instead of C-Shift-<->).

Layer 2 configuration

I also configured the Dygma key to toggle media start/stop, so that it doesn’t matter if I shifted to Layer 1 or 2. And, finally, there is an experimental alternative configuration of some movement keys (Home, End, PgUp/Down) on the thumb keys.

Conclusion

The Dygma Raise is an expensive keyboard, no doubt. Did it help with my back pain? Not so much, unfortunately. But it is an amazing keyboard that allows to be configured in lots of ways. Would I recommend it? I surely would. Oh, and did I mention that it comes with a case to be carried around? Ah, keyboard nerds galore.

Who Am I and If So How Many? Using multiple Firefox profiles.

Posted by Holger Schauer on Friday, January 8. 2021

So, I’ve changed my Firefox setup quite a bit over the last months. I now work with multiple profiles, e.g. one for online banking, one for social media use and one for development purposes and one for “default” use (i.e. pretty much else). I’m quite happy with the experience.

Usually I have all three profiles running in parallel, showing on different virtual desktops of my linux box. I.e., on desktop 2, I usually have my development environment open, so I’ll open the “development” browser there, desktop 4 runs all the applications for interacting with other people (mail etc.), so “social” goes there and the default one will show on virtual desktop 5.

Why would you do this?

One direct effect is that the amount of open tabs in any given browser window is a lot smaller and also better grouped than before. I.e., any web page I need for my current development project I will only open in my “development” profile. I will only open my bank’s page in the banking profile, so this browser window will never show anything else.

The other main benefit is that I can have profile specific configurations. E.g., I nail down my “default” profile with NoScript which is not really useful on my “development” profile, whereas I don’t need e.g. the React Dev tools on the “social” or “default” profile.

Dedicated profiles can also help with security, e.g., using a dedicated profile can lower the attack surface for online banking: When you don’t browse to other sites with the same browser/profile, any XSS/CSRF issue on these “other” sites for sure can’t affect your online banking connection.

The profile I use exclusively for online banking is also highly locked down and in addition uses a different theme, so that it is visually obvious that I’m working with this profile.

Get me started, please

To start using multiple profiles, you have to run Firefox with the -P switch, which will start the Profile Manager that allows you to create new profiles, delete profiles etc. Alternatively, if you have Firefox already running, browsing to about:profiles will also allow you to manage your profiles.

For a while, I just started the non-default browsers manually over the command line, by just opening an xterm and running firefox --no-remote -P social &. But I finally created some additional local .desktop files (cf. the Arch wiki page on xdg desktop files), so I can start Firefox from the desktop. I.e., I added a file $HOME/.local/share/applications/socialbrowser-usercreated.desktop with the following content:

[Desktop Entry]
Name=Social Firefox
Comment=Browse the World Wide Web
Comment[de]=Im Internet surfen
Exec=/usr/lib/firefox-esr/firefox-esr --no-remote -P social %u
Terminal=false
X-MultipleArgs=false
Type=Application
Icon=firefox-esr
Categories=Network;WebBrowser;
MimeType=text/html;text/xml;application/xhtml+xml;application/xml;application/vnd.mozilla.xul+xml;application/rss+xml;application/rdf+xml;image/gif;image/jpeg;image/png;x-scheme-handler/http;x-scheme-handler/https;
StartupWMClass=Firefox-social
StartupNotify=true

This will then create a menu entry in the “Internet” submenu in my application starter menu in my desktop environment (because that’s where the given categories will create entries).

Any drawbacks?

One annoying thing is the profile selection that Firefox pops up when you don’t specify a profile on the command line. If you select the option “Use the selected profile without asking at startup”, then it will not be easy anymore to use a different profile — the only way then is really to use -P again.

This default profile selection becomes a problem especially when you want to open a link from a different application (eg. from your mail program), because then you can’t decide which of the running browsers/profiles will open the link, it will always try to use the default one. I have seen varying behavior what Firefox does when you don’t select a default profile: sometimes it just picks one running profile successfully, but I’ve also seen it opening the profile selection dialog again. In that case, if you select a profile that is already in use, Firefox will handle it like an attempt to open up the profile a second time, resulting in an error. My current workaround to this particular issue is to set the default browser to Chromium via xdg-settings set default-web-browser chromium.desktop.

The other hassle working with multiple profiles is bookmark management, as I want some bookmarks only local to one profile but most should be shared. I can use Pocket for the shared ones, of course. However, I often just copy&paste the URL manually to the “default” browser which serves as the main bookmark keeper. I really should move away from this completely and instead use the bookmark extension of my Nextcloud installation.

Overall, for me the benefits clearly outweigh any drawbacks.

File download with ClojureScript

Posted by Holger Schauer on Wednesday, July 15. 2020

As I couldn’t find a recipe on how to provide some data from a ClojureScript application for download, here’s how. If you know how to do this in JavaScript already and if you’ve done any CLJS-JavaScript interop, there’s nothing new for you to learn here, as this is a pretty straight-forward translation of how to use the Blob API and clicking a temporary link in JavaScript,

(defn file-blob [datamap mimetype]
  (js/Blob. [(with-out-str (pp/pprint datamap))] {"type" mimetype}))

(defn link-for-blob [blob filename]
  (doto (.createElement js/document "a")
    (set! -download filename)
    (set! -href (.createObjectURL js/URL blob))))

(defn click-and-remove-link [link]
  (let [click-remove-callback
    (fn []
      (.dispatchEvent link (js/MouseEvent. "click"))
      (.removeChild (.-body js/document) link))]
    (.requestAnimationFrame js/window click-remove-callback)))

(defn add-link [link]
  (.appendChild (.-body js/document) link))

(defn download-data [data filename mimetype]
  (-> data
       (file-blob mimetype)
       (link-for-blob filename)
       add-link
       click-and-remove-link))

(defn export-data []
  (download-data (:data @some-state) "exported-data.txt" "text/plain"))

I’ve tried to break it down into pretty self-explaining pieces, but here is a bit of explanation: export-data would be used as an on-click handler on some UI element and would expect to gather the data to be exported in some way. Here, we’re just assuming the data is already stored in some state-atom and is a map. file-blob is pretty-printing the data somewhat, declaring the content to the given MIME type (e.g. text/plain) and then returning the newly created blob. Of course, you might want to change the pretty printing or the MIME type, depending on your data.

We’re not doing anything fancy with the created Blob object, we simply hand it over to URL.createObjectURL whose result we use as the href attribute of a newly created anchor element. Setting the download attribute will tell the browser not to navigate to the URL. After this we simply add this new link to the DOM and then execute the download by dispatching a MouseEvent on said link. This actually triggers the download, i.e. the browser will open up a file save dialog with the suggested name, so the only thing left to do is to clean up the link from the DOM.

Threadripping under Debian

Posted by Holger Schauer on Monday, November 19. 2018

Getting a Threadripper machine to work under Debian

After a long long time of more than eight years my old machine started showing hardware problems: first the power supply failed and had to be replaced. Next the CMOS battery died. It became clear that finally a replacement would be in order. When the first reports about AMDs Ryzen family came out and in particular with the Threadripper tests, I became interested. In the end, I waited until the Threadripper 2950X was available before ordering a custom-built machine from a local vendor. Here is the basic hardware setup:

CPU: Threadripper 2950X (32 cores)
Mainboard: MSI 399X Carbon
Memory: 32 GB
GPU: Nvidia 1060
SSD: Samsung 970 Evo NVM 1TB

This post will describe what I had to do to on the software / linux side to bring the machine to a usable state.

Basic installation

As a Debian user of old and knowing that the latest Debian stable (Stretch) would not provide a recent enough kernel, I installed Debian Buster. This went more or less flawlessly. However, I can’t help but note that the installation of lvm and disk encryption (ie. lvm2 with lukssetup) provided by the Debian installer is really bad. It hasn’t improved one bit since ages: the automatic partitioning will come up with a crazy sizing scheme and there is absolutely zero useful support when you go manual. It’s easy to end up with an installation which looks like it succeeded but rebooting will end up on the rescue console because mounting the root volume group fails.

The only other noteworthy thing is that I directly installed the proprietary Nvidia drivers because the free nouveau driver doesn’t really support any of the “newer” and more advanced 10X0 cards from Nvidia. As I am a LXDE user, that was of course the desktop environment I installed. However, I installed LXQt and KDE along with it, to also learn about their current state, but went quickly back to LXDE — LXQt is supposed to be the successor of LXDE but it’s currently not really in a comparable state.

I also installed Gnome briefly, only to discover that it’s trying to use Wayland which apparently has problems with the Nvidia drivers. I couldn’t make it work and as I was never a big Gnome fan anyway, I simply threw out Gnome again.

system-udev crashes and kernel compilation

With the system up and running, the first problem I noted was system-udev constantly crashing. This also prevented suspend/hibernate from working. A longer internet search finally revealed this system-udev bug report in Red Hat’s bugzilla. Apparently “the BIOS/firmware is advertising it supports SEV, when in fact it doesn’t” where SEV is AMD’s secure encrypted virtualization technology. If the kernel config option CONFIG_CRYPTO_DEV_SP_PSP is set to yes then the kernel would use it and apparently most distribution provided kernels are setting it — the one from Debian Buster (4.18.0-2 at the time) surely does.

There are apparently three ways to fix this issue:

wait until the BIOS/firmware is fixed to no longer provide the wrong information,
wait for a newer kernel to provide a workaround,
or compile the current kernel with CONFIG_CRYPTO_DEV_SP_PSP set to no.

Apparently, since 4.19-rc5 the kernel has a workaround for the issue, but at the time of writing Buster and Sid only have 4.18. So, off to compile a new kernel without that setting. This turned out to have become a lot more complicated since the last time I had to do this on a Debian system (which was still using make-kpkg at the time, so yes, that’s several years I didn’t had the need to compile a kernel). It also doesn’t help that quite a bit of documentation out there is outdated — the Debian Kernel handbook seems to be the proper documentation.

Unfortunately, it’s still easy to get it wrong. I ran into issues with certificates (why exactly you would need certificates to compile a kernel is beyond me, my wild guess is it’s related to “secure” boot) or my changes to the configuration were overwritten during the process and other issues. In the end, the following process “works for me”:

    # cf. chapter  4.2.3, generate the setup for amd64
    make -f debian/rules.gen setup_amd64_none_amd64

    # fire up the config dialog, now enable RCU-BOOST and disable PSP
    make -C debian/build/build_amd64_none_amd64 xconfig

    # build the kernel
    make -f debian/rules.gen binary-arch_amd64_none_amd64

The only problem with this is that the generated kernel will have the same numbering as the official package. So, a newer minor version of the Debian package can overwrite your manually build package. I haven’t really looked into this yet, but will update this section once I do (and yes, the process described in section 4.3 of said manual didn’t work for me).

Btw., don’t expect kernel compilation to be a fast process. Apparently, the kernel configuration that is used out-of-box for Debian compiles half the world and consequently this takes ages even on such a high-end machine.

The newly compiled kernel fixes the systemd-udev crashes and then also suspend / hibernate worked. However, when I triggered hibernate from the LXDE logout dialog, the machine wouldn’t power-off. Another long round of searching the web, including reading the source to lxsession-logout.c, revealed the solution to disable upower via systemctl disable upower.

Random hangs

During my first attempt to compile a kernel, my system just hang. Completely, not even the magic sysreq worked anymore. That’s apparently another known issue, but there are no good solutions. The attempt to set RCU_NOCB_CPU und RCU_BOOST is one such attempt (cf. the soft hang discussion in the AMD forum, but this didn’t really help me (cf. also the random soft lock discussion on the kernel bugtracker). However, the also linked ZenStates github repo contains a Python script which disables the C6 state.

Again, the forum suggests that the issue might be fixed with newer BIOS versions, but my mainboard has the mentioned AGESA version and the issue still occurred. Disabling C6 state per the script fixes the problem, but results in higher energy consumption and is hence not exactly a perfect solution either. If you want to run zenstates automatically, there is also a systemd template for zenstate. Note that this needs some additional tweaking (which I didn’t get around to yet) to run modprobe msr.

PCI errors

One other thing I noted in the logs were re-occuring PCI errors. There are a number of suggested fixes, cf. this PCI error answer on askubuntu to use pci=nomsi or pci=noaer or this other PCI error suggestion on unixstackexchange to use pci=nommconf. For me, pci=noaer hides the errors successfully and for the moment that’s good enough (read: I haven’t investigated whether the other suggestions would actually fix the issue also).

Virtualization

The latest thing I ran into was that Virtualbox was refusing to start a virtual machine, claiming the AMD feature would be disabled in the BIOS. This turned out to be the case, SVM was disabled. Actually, I couldn’t easily find the option in the first place, I had to use the search in the Bios.

Two things I haven’t tried out yet

The two things I haven’t used yet are the goodies the motherboard provides over the one I originally wanted: Bluetooth and Wlan. I did install the intel firmware to support both features, but can’t really say anything more about it yet.

Conclusion

I don’t have a conclusion just yet. It’s clear this new machine is a lot faster than my old box, but given the Core2Duo cpu is also quite long in the teeth now and that the old one didn’t had a SSD, that’s to be expected. Although this article has gotten longer then I would have hoped for, overall the machine is running quite well (with me running a testing version of Debian which is usually really badly supported). In particular, I can’t say I ran into any issues with the peripherals so far. I guess the machine will have to show it’s power over the next months and hopefully there are some more fixes to BIOS/firmware and the kernel as needed.

Fun with function signatures

Posted by Holger Schauer on Monday, November 16. 2015

In a blog post on dependency inversion in Clojure I’ve discussed what this DI principle actually is about and the solutions Clojure offers to support it. There is one aspect that bugged me a little: For me, a fundamental challenge with DI in a language like Clojure is that you often have simple functions depending on simple other functions (simple here in contrast to protocols). In the article linked to above I discussed one way of resolving function dependencies by using an indirection over a service locator. However, in practice writing service locator lookups by hand was getting tiresome soon. So instead I decided to have some fun throwing together some macros that handling function signatures, which resulted in a small library.

Enter funsig

funsig shoots lower than Clojure protocols: it provides dependency management on a per-function level. What this means is simply that you can define a function signature with defsig and then provide implementations with defimpl. Implementations will depend on the signature. Let’s say we have some application code that depends on a printer function:

(ns my.onion)

(defn printer [string]
    (println string))

(defn print-account-multiplied [account multiplier]
    (let [result (* account multiplier)]
        (printer result)))

One might want more flexibility on how and where to print. In other words, one might want the application code (print-account-multiplied) to depend on an abstraction (printer) only and not on the concrete implementation as in this example.

Funsig allows you to inverse the dependency of the printer implementation. You would define the signature with defsig and have the appplication code depend on the signature like this:

(ns my.onion
    (:require [de.find-method.funsig :as di :refer [defsig defimpl]]))

(defsig printer [string])

(defn print-account-multiplied [account multiplier]
    (let [result (* account multiplier)]
        (printer result)))

You can then provide the implementation with defimpl:

(ns my.onion.simle-printer
    (:require [de.find.method.funsig :as di :refer [defimpl]]
              [my.onion :as mo :refer [printer]]))

(defimpl printer [string]
    (println string))

Note that the implementation has a dependency on the signature, not the other way around. Also, application code (print-account-multiplied) simply depends on the signature — here the signature is in the same file, but reference to the var in another namespace (i.e. using require\:refer) also works normally. For application code, this looks like dependency injection.

funsig will also allow you to have multiple implementations and then select the one you want.

Have fun!

You’ll find all the gory details in the README and / or in the intro document. The latter also explains the relationship to the service locator pattern mentioned above.

Feedback welcome!

Technical debt equals missed quality requirements

Posted by Holger Schauer on Thursday, October 8. 2015

Technical debt assumes that you have an existing system and you know already about the areas where the duct tape is becoming thin. Let’s discuss the problem, the conflict around technical debt a little. It’s basically always the same battle: the guys maintaining the system probably have good reasons why they want to pay back on the accumulated technical debt, whereas the product owner believes more functionality is more important.

This is a typical clash of interests between people from different tribes and usually from different departments: the developers report to a technical manager, the product owner to a business guy. The IT manager is typically under pressure to minimize costs, in particular costs for support and maintenance, so he’s interested in paying back on technical debt during projects. The business manager, however, is under pressure to convince new clients with new features. So, that there is a difference in how technical and non-technical people set priorities between technical debt and new features doesn’t come as a surprise: naturally, to the business side this is all about technical issues which they don’t consider to be their concern. It’s only natural that they believe that tackling such technical issues should be solved by the technical folks “somehow”. After all, it were these guys which build the system with all these problems in the first place, and now they want to spend more time and money on this? Although I’ve exaggerated here a little maybe and this line of argument is way too simplistic (given that often debt is accumulated over time), it’s still quite popular. Such product owners seem to believe that they are only responsible for developing new functionality. Corporate culture can contribute to this: some companies have a culture where every new glitter is taken for a star and gets way more attention than the cash cow functionality that keeps existing clients happy.

So some product owners will see technical debt as a separate issue which needs to go on a maintenance budget, a budget that somebody else is responsible for, e.g. the IT department. The problem with that idea is that in a culture where building new features is the only thing that counts, typically no one ever gets around to clean up the technical debt. All what will happen is that you’ll have some people that fix bugs with this maintenance budget. So you don’t actually ever spend the money on technical debt but only on the results of it which of course doesn’t address the root of the technical issues. Also, if you finally do get an approval for a refactoring project, these are really the most boring and horrible types of projects to work on. So it’s not likely to see a lot of happy, motivated people on such projects — and sure enough the most competent people are probably assigned to more valuable projects anyway. As a result and because there is no business pressure on such refactoring projects, as soon as something else comes up, such refactoring initiatives are abandoned or at least down-sized to a minimum. Peter Seibel describes in a recent, lovely written article the need for an Engineering Effectiveness team at Twitter in which he lays out in some details how hard it is to really put up with technical debt.

Let’s drill down on technical debt from the point of view of requirements. Wait a minute, this doesn’t sound right — how could technical debt be a requirement? Obviously, it isn’t, but there is a close relation. Technical debt can come in different flavors: e.g. you really should install a second machine and a load balancer so you’re prepared for failure or we really need to rewrite module foobar.clx to finally get rid of all the spaghetti code that’s slowing us down to oh, we really need to support responsive design, we’re getting more mobile users every day. Now, if you take a look at these three simple examples, they are related to quality requirements: the first is about availability, the second about maintainability and the last about usability. Take a look at the ISO 25000 standard for software product quality page listing all the quality attributes one might want to consider. Technical debt is always tied to some quality that your system should offer, but doesn’t.

A reason for this is that quality attributes are often not made explicit during requirements gathering, regardless of whether you’re following an agile approach or not. The less technical the product owner, the more often they seem to assume that performance, scalability, security and the other -ilities will come out just right by magic. Technicians, on the other hand, know perfectly well that they will not. It’s worse if they don’t that: they will build something that might fulfill the functional requirements but not the non-functional requirements. “Works on my machine” might be fine for a naive developer, but nobody will be happy if takes 5 minutes to load the page for the gazillions of parallel requests in production. There is a reason why software architecture is mostly concerned with quality, if you don’t plan and build for scalability, it’s unlikely you end up with a highly performing system. Take a look at the picture: this bridge wasn’t planned to cope with the amount of traffic it sees nowadays, some time in the 80s somebody decided it would be okay to have three lanes instead of two and didn’t think through the long term consequences.

The bridge of the highway A40 over the Rhein in Duisburg. It is damaged so bad that trucks are no longer allowed to cross it.
Picture made by kaʁstn Disk/Catstitched by Daniel Schwen, licensed under CC BY-SA 3.0 de over [Wikimedia Commons]( https://commons.wikimedia.org/wiki/File:Rheinbruecke_Neuenkamp_pano.jpg#/media/File:Rheinbruecke_Neuenkamp_pano.jpg)

Of course, technical debt can also accumulate over time, one might rightly point out: sure, one machine was enough for the requirements at the start, but nobody followed up on the increase in users and nobody cared (or had time) to clean up the code in module foobar.clx back then (nor in the following six years of quick bug fixes and minor one-off patches that have grown on it like leeches). This is a sign that nobody actively had an eye on how the world in and around the systems changes and to take action early on. For code quality, Uncle Bob points to the Boy Scout rule which says that you should always clean up (regardless of who messed it up) — another way of trying to ease maintainability by paying back on technical debt every day.

The over-aching point here is that you, as a technician and you, too, as a product owner need to think about the quality requirements of your system and to do that over the entire life cycle. What was a fitting solution at a time might turn into technical debt over time. This means means your system now doesn’t hold up to the quality requirements you and your clients needs and no matter what was the cause for it, you’re better of fixing it now than accumulate even more technical debt.

Isaac Sacolick describes in his article on How to get an agile product owner to pay for technical debt how to address the common problem of technical debt heads on, mostly from a managers perspective via process and by making people (mostly the technical lead) responsible. However, I think it makes a lot more sense to try to ensure a common understanding between the developers and the product owner. One way to go about this is to use quality scenarios: as a member of the dev team, ask your product owner how many users the system needs to handle. Ask her if she thinks it’s okay if somebody might find a security issue earlier than you. Or if she thinks it’s okay when a seemingly small change will take three weeks because the code is an intangible mess that nobody understands since Dieter left? These sort of questions hopefully open up to discussions based on business value, ROI and cost of delay. You are then using the language product owners understand and you might also learn something along the line (no we expect so many users to do X in the future, so investing in module Z doesn’t make sense).

Now, granted that doesn’t always work. An old colleague of mine always denounced the old system he worked on as being exactly the way it was ordered. Isaac’s article might offer some advice here, of which the most important one is probably that paying back technical debt is way easier if you have a CIO or someone with similar power supporting you.

In-memory database fixtures with Clojure and sqlite

Posted by Holger Schauer on Thursday, May 14. 2015

For a recent project, I need to extract data from a sqlite3 database. Writing the Clojure code to retrieve data was very straight-forward with clojure.java-jdbc and java-jdbc/dsl. Naturally, I wanted to have some tests for this code as well. In a previous Python project, I had a lot of fun using sqlite’s in-memory feature to run very speedy database tests, so of course I wanted this for my current Clojure project, too. This turned out not to be so easy as I had expected though, so I’m documenting it here for the next naive soul. My initial attempt with clojure.java.jdbc, java-jdbc/dsl and midje looked basically like this:

    (def testdbspec
      {:subprotocol "sqlite"
       :subname ":memory:"})

    (defn make-bookmark-table []
      (jdbc/with-transaction [db testdbspec]
         (jdbc/db-do-prepared db
           (ddl/create-table :bookmarks
                      [:id :int :primary :key]
                      [:type :int]
                      [:title "longvarchar"]))))

    (defn add-bookmark []
      (jdbc/with-transaction [db testdbspec]
         (jdbc/db-do-prepared db
           (str 
              "INSERT INTO bookmarks (id, type, title) "
              "VALUES ('12453', '2', 'a bookmark')"))))

    (defn setup-database [db]
       (make-bookmark-table)
       (add-bookmark)

    (facts "Testing database access to bookmarks"
       (with-state-changes [(before :facts (setup-database))]
            (fact "We can retrieve a list of bookmarks"
                (fetch-tags :dataspec testdb) => [{:title "a bookmark"}]))))

This will fail quite early, because basically as soon as the with-transaction in make-bookmark-table has finished its work, the connection to the database will be closed. As a result, when the next with-transaction or jdbc\query is run, you’ll connect to a fresh in-memory database which doesn’t have the tables you just created. My old Python test code didn’t have this problem, because the setUp method of the TestCase would create the database connection (via sqlalchemy’s create_engine) and would keep it alive until the TestCase tearDown method would run.

I tried giving back the database connection from make-bookmark-table, but this just results in a “connection closed” error. Unfortunately, clojure.java.jdbc doesn’t support opening and closing the connection yourself. Sure, you can use get-connection, but you can’t feed this into either with-transaction or query. query uses with-open internally, which will conveniently close the connection for you. In a post on the perils of dynamic scope Stuart Sierra calls this the Dynamically-Scoped Singleton Resource and files it under ‘anti-pattern’. I gotten bitten pretty exactly by what Stuart describes: when dealing with sqlite’s in-memory feature, we would like to manage the connection ourselves, but we can’t.

After banging my head against this for a while, the only option I could come up with resorts to extract the relevant with-transaction from the setup code. Instead you have to wrap the tests with the transaction and then call the setup code, like this:

    (defn make-bookmark-table [db]
      (jdbc/db-do-prepared db
           (ddl/create-table :bookmarks
                      [:id :int :primary :key]
                      [:type :int]
                      [:title "longvarchar"])))

    (defn setup-tables [db]
       (make-bookmark-table db))

    (defn add-bookmark [db]
       (jdbc/db-do-prepared db
           (str 
              "INSERT INTO bookmarks (id, type, title) "
              "VALUES ('12453', '2', 'a bookmark' )")))

    (defn remove-bookmark [db]
       (jdbc/db-do-prepared db
            (str "DELETE FROM bookmarks WHERE id = '12453")))

    (facts "Testing database access to bookmarks"
       (jdbc/with-db-transaction [db testdbspec]
            (setup-tables db)
            (with-state-changes [(before :facts (add-boomark db))
                                          (after :facts (remove-boomark db))]
                 (fact "We can retrieve a list of bookmarks"
                     (fetch-tags :dataspec db) => [{:title "a bookmark"}]))))

This works as expected.

Dependency inversion in Clojure

Posted by Holger Schauer on Tuesday, December 2. 2014

The problem

I was recently reading a nice German book on Effective Software Archictecture by Gernot Starke and stumbled upon a discussion of the dependency inversion principle, which got me thinking. Gernot Starke first discusses the problem with an allusion to traditional procedural programming (translation mine):

Classical designs for procedural languages show a very characteristic structure of dependencies. As (the following figure) illustrates, these dependencies typically start from a central point, for instance the main program or the central module of an application. At this high level, these modules typically implement abstract processes. However, they are directly depending on the implementation of the concrete units, i.e. the very functions. These dependencies are causing big problems in practice. They inhibit changing the implementation of concrete functions without causing impacts on the overall system.

Classical dependencies in procedural systems He then goes on to introduce the idea of programming against abstractions and introduces the idea of the dependency inversion principle, first coined in Bob Martin’s DIP article (see also another thorough discussion in Brett Schucherts article on DIP). Basically, the idea is that the integrating process refers only to abstractions (i.e. interfaces) which are then implemented by concrete elements (classes), cf. the next figure.

Integrate with abstractions When I take a look at some of my recent Clojure code or at some older code I’ve written in Common Lisp, I immediately recognize dependencies that correspond to those in a classical procedural system. Let’s go for an example and take a look at one specific function in kata 4, data munging:

(ns kata4-data-munging.core
    (:require [kata4-data-munging.parse :refer [parse-day]]
              [clojure.java.io :as 'io]))

(defn find-lowest-temperature
    "Return day in weatherfile with the smallest temperature spread"
    [weatherfile]
    (with-open [rdr (io/reader weatherfile)]
         (loop [lines (line-seq rdr) minday 0 minspread 0]
        (if (empty? lines)
            minday
            (let [{mnt :MnT mxt :MxT curday :day} (parse-day (first lines)) ;<-- dependency!
              curspread (when (and mnt mxt) (- mxt mnt))]
            (if (and curday curspread
                  (or (= minspread 0)
                  (< curspread minspread)))
               (recur (next lines) curday curspread)
               (recur (next lines) minday minspread)))))))

The dependency here is on the concrete implementation of parse-day, you can basically ignore the rest for the argument here. Given that this was a small coding kata, this is not unreasonable (and in the course of the kata, the code changes to be more general), but the issues here are obvious:

if we would like to parse a weather-file with a different structure, we have to change find-lowest-temperature to call out to a different function,
if the result of the new function differs, again we have to change the implementation of find-lowest-temperature,
we also have to change the namespace declaration, i.e. we probably want to require a different module.

Clojure’s built-in solutions

The application of the dependency inversion principle is typically shown in the context of object-oriented programming languages, like Java where you use interfaces and classes implementing those interfaces for breaking the dependency on concrete implementations, cf. the figure above again. But as we’ll see the principle can be applied independently of object-orientation. I’ll discuss higher-order functions, protocols and multimethods as potential solutions.

Higher order functions

For starters and probably painfully obvious is to make use of the fact that Clojure treats functions as first-class objects and supports higher-order functions. This simply means that we can pass the parsing function as an argument to find-lowest-temperature.

(defn find-lowest-temperature
    "Return day in weatherfile with the smallest temperature spread"
    [weatherfile parsefn] ; <-- function as parameter
    (with-open [rdr (io/reader weatherfile)]
         (loop [lines (line-seq rdr) minday 0 minspread 0]
        (if (empty? lines)
            minday
            (let [{mnt :MnT mxt :MxT curday :day}  (parsefn (first lines))
              curspread (when (and mnt mxt) (- mxt mnt))]
            (if (and curday curspread
                  (or (= minspread 0)
                  (< curspread minspread)))
               (recur (next lines) curday curspread)
               (recur (next lines) minday minspread)))))))

This way, we can simply call (find-lowest-temperature "myweatherfile" parse-day) and freely substitute whatever file format and accompanying parse function we need. What does this buy us?

We no longer have to modify find-lowest-temperature when we want to use a different parse-day function.
The namespace containing find-lowest-temperature also no longer requires the (namespace containing the) parse function.

But there is also a down-side: find-lowest-temperature assumes that all parsing functions it will get fed adhere to a signature that is entirely implicit: parsefn needs to take exactly one line and needs to return a map with given key-names. Higher-order functions don’t provide a solution for this per-se, so in order to solve the implicit signature issue we need to look elsewhere. This is nothing Clojure specific: Assuming you’ve passed in an object either as a method parameter or via Setter-Methods or Constructor-Injection (cf. dependency injection), Python’s or Ruby’s duck-typing basically works the same way: the caller of a method simply assumes that the callee offers a method with the right signature. It is the responsibility of the caller (of find-lowest-temperature) to provide a matching function for parse-fn.

However, this actually amounts to just move the problem from one level to the next: now some other level has to decide which concrete parse function needs to be used. This next level will have again the exact same problems: it will depend on both concrete implementations of find-lowest-temperature and parse-day (or any other parse function). If you think this through, it’s obvious that in general at one point or another, you have code that determines which function to call and which parameters to use. The question is only if we can use abstractions or whether we have to use concrete implementations. We’ll return to this issue, that now at some other level you need to handle the problem, later.