Fixing broken boot from encrypted devices after upgrade to Debian Bookworm

Fixing broken boot from encrypted devices after upgrade to Debian Bookworm

My workstation uses LUKS to encrypt the underlying filesystem on two SSDs (NVMe) devices, atop of which I have lvm2 running to create a big filesystem, which worked nicely the last couple of years (more details will follow below). But after the upgrade from Debian 11 (Bullseye) to Debian 12 (Bookworm) my system refused to boot.

More concretely, I was greeted like always with the prompt to decrypt my root device, which was also working as expected — i.e. I received the “setup device successfully” message from the initramfs. However, the boot failed to mount the resume / swap device and then gave up completely with the information that it couldn’t mount the root device and the lovely message:

ALERT! /dev/mapper/pixie--vg-root does not exist.

Debug attempts

Initramfs debug shell

Searching the web let to a relevant looking thread on forums.debian.net, which gave some initial ideas. In particular, I could make some progress using these two commands:

  vgchange -ay
  vgchange -ay --activationmode partial

Afterwards, my /dev/mapper/ showed the expected devices. However, I couldn’t mount them, hence also couldn’t follow the further advice there.

What is clear to me now but wasn’t at the time is that I should have exited the initramfs emergency shell (via exit) and tried to continue to boot the system. This likely would have worked and allowed me to debug and fix the issue from the machine directly.

Rescue mode from USB stick

Instead, I created a USB stick with the netiso installation image for Debian 12 and booted from the stick into rescue mode (in the advanced options). This nicely detected my hardware, and then prompted me for each of my SSDs to decrypt them.

That’s when I figured out that of course both of my NVMe devices were encrypted separately and not even using the same password. I had forgotten this, but my password store contained of course both passwords, so I could decrypt both devices.

Fixing the problem

Wrong device name

I had read in a reddit thread that Debian uses a modified crypttab(5) format, adding an option initramfs to identify the root device, resume devices and others which needed handling in the initramfs phase of the boot. This option was missing, so I added this to the entries in my crypttab and tried to update the initramfs via update-initramfs -u.

Unfortunetely, this threw an error about a source mismatch for one of my devices and also gave a warning that the cipher method for decryption could not be figured out. After some while, I realized that my /etc/crypttab/ specified nvme0n1p3_crypt as the device but that it should really be nvme1n1p3_crypt (note nvme0 vs. nvme1). I’m not sure if this error had been in there all the time (and if so, why did it work before) or if something changed with Bookworm, triggering a device name change.

After correcting this error, I could finally update the initramfs and rebooted. Unfortunately, I ran into the same problem on boot.

Key file not accessible

This time while debugging in the initramfs emergency shell (and also searching for hints on the web), it became clear to me that my other device nvme0n1p1 was the actual culprit. In particular, I was using a key file (stored in /etc/keys/nvme0n1p1.luks) to decrypt this device automatically, which is also the reason I had forgotten about the two differing passwords for the devices.

So, I edited the crypttab once more to no longer reference the keyfile. So, I changed /etc/crypttab from:

nvme1n1p3_crypt UUID=[elided] none luks,initramfs
nvme0n1p1_crypt UUID=[elided] /etc/keys/nvme0n1p1.luks luks,initramfs

to the new version:

nvme1n1p3_crypt UUID=[elided] none luks,initramfs
nvme0n1p1_crypt UUID=[elided] none luks,initramfs

After another update-initramfs -u -k all, I could finally again boot into my system. Of course, I now have to give both passwords on boot.

I’m not sure if this is a recent change, but apparently now you need to specify a keyfile pattern in /etc/cryptsetup-initramfs/conf-hook. So, I now have the following in this file:

root@pixie:/etc/cryptsetup-initramfs# grep ^KEYFILE_PATTERN conf-hook
KEYFILE_PATTERN=/etc/keys/*

And then reverting the change to the crypttab above, re-running the update-initramfs makes the automatic decryption for the second device work again.

Categories: Linux
Defined tags for this entry:

Conway's law and the role of managers

Allen Holub had an interesting (post, tweet) toot about managers deciding team structure and software architecture:

Comic of a developer thinking about managers deciding the architecture. (c) by Luxshan Ratnaravi & Mikkel Noe-Nygaar under CC BY-ND 4.0
Conway’s Law by Comic Agile, CC BY-ND 4.0

If management is setting up team structure, they are designing your architecture. They know nothing about architecture, though. Self-organizing teams are essential. (See Conway’s Law). — Allen Holub

I mean, who am I to disagree with Allen Holub, but I agree completely with his conclusion but not with this argument. It might seem a little unfair of me to react to his tiny toot with a lengthy blog post, as I’m sure Allen could talk at length about the topic with tons of good insights, so please don’t take it as a criticsm of his post but as a way to express my thinking about how I look at Conway’s law and the role of managers.

Out of the ivory tower

First of all, I guess it depends on who you are referring to with “management”. If Allen is talking about top level management, then I agree that they usually don’t know about your architecture. But is this also true for the middle managers close to the teams doing the work? This level of management should know about architecture in my humble opinion, to avoid making foolish mistakes and to understand what the developers are talking about.

I think managers should today know about both architecture and Conway’s law. Ideally, they should also know about the Inverse Conway Maneuver (as eg. discussed in Martin Fowler’s article on Conway’s Law) which is simply the idea that you could structure your teams according to the wanted design of your system. Assuming the managers understand the architecture of the system (both in terms of how it’s structured currently but also a future target state), I think that managers can make use of this strategy when organizing their teams.

Communication is key

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure. — Melvin E. Conway

Second, I think it is worth pointing out that Conway’s Law is not about how system architecture and team structure interact, but about the influence of an organization’s communication structure. Now, organizations are often re-structured with the implicit assumption that org structure matches communication structure. But this might be a bit naive, because organizations not only have a formal organization but also an informal one (see this overview of formal vs. informal organization). The latter is quite important for how communication will actually happen in my experience. So a change in the formal organization does not necessarily mean the relevant communication channels that drove your architecture before will suddenly disappear. In other words, people will remember that Eva knows a lot about the data access layer and will include her in relevant discussions, regardless of which team she’s in.

headquarters of the Romanian Architects Association
The headquarters of the Romanian Architects Association, built on the ruins of the Direcţia V Securitate, by Dimitris Kamaras, CC BY 2.0

Also, as discussed in Fowler’s article on Conway’s Law, it is questionable if Conway’s Law applies to all organization sizes. If you have two very small teams, in which communication among all people is easy and natural, there are a lot of other options than ending up with two components. Sure, if the communication between your people is complicated, you can expect to see this as friction in the architecture. So, if Fred doesn’t like working with Eva, he might be inclined to put in a completely new different data access layer. However, this might also happen if Fred and Eva are within the same group (I wouldn’t call them and their peers a team if that happens).

The Wikipedia page on Conway’s Law discusses supporting evidence that modularity of systems is directly related to how tightly coupled your organization is. So, managers shouldn’t just work on reporting lines but also on enabling good communication between the teams — while you might not be able to influence the informal structure, you can nevertheless try to set up good channels within the formal org structure. I certainly hope that you’re working in an organization which actively promotes cross-team collaboration and understands the benefit of networking, and not one in which you have to move up the org chart to find the highest shared person to convey information and to drive decisions.

End to end work

Third, managers should be aware of the idea that working in feature teams is usually better than working in component teams (cf. Roman Pichler’s more nuanced take on feature vs. component teams): the idea here is that a feature team would have all the capabilities to deliver a feature end to end, i.e. it does not depend on other teams. If your system already has a good design to support your feature development, then your cross-capability feature teams should be able to use and extend it. Naturally, if your system is big enough there might be more to know about the architecture then you can fit into the heads of your team — that’s when you’ll need again good communication channels across your organization. This will not invalidate Conway’s law but might enable that a change in the architecture isn’t just driven by one’s team decision alone.

Constant change is inevitable

Fourth, team and communication structure is a much richer and deeper topic that people have been thinking and writing about like forever and only alluding to Conway’s law ignores many other insights. E.g., team topologies is an approach that approaches an organization structure from the point of view of communication meachanisms and minimizing cognitive load of the people working within it. If you take a look at the team topologies in a nutshell you’ll see that it tries to also tackle the “pushing against Conway’s law”. If you follow the route as outlined in how to get started with team topologies you end up looking at the same fundamental pieces: the architecture and communication routes. That’s a tool that managers can use (and there are more, like Jurgen Appelo’s unfix framework).

And then you have all the good insights from Heidi Helfand that these days we’re constantly changing our team structures anyways, cf. Dynamic Reteaming. As she discusses, our teams change for so many reasons, people leaving, people joining, specific organizational needs etc. It’s only natural that a team structure will evolve over time, just like system architecture does (team topologies also expects to evolve your organization over time). It sounds a little bit over the top to me, to always pull the teams together and trigger a self-selection event in these cases (not that Alan is suggesting this). If we would reduce our thinking about team structure and reteaming to the technical or system architecture view only, we would swap one mistake (ignoring the technical implications) with another (ignoring people interactions and other needs), which would arguably be even worse.

Conclusion

So, what does leave us with? That managers should move around people like chess pieces, like they’ve always done and that’s that? No, of course not. As I said, I’m very much in agreement that self-organizing teams are essential. The people which are close to the work should have an important say in discussing the organizational structure. They can bring in the knowledge about architecture and technical dependencies, they will also bring in their preferences and personal needs and all of this is very important. But I also believe managers need to be involved, too, to clarify other needs to be considered (eg. business needs, future organization plans like hiring, etc). Plus you don’t want to burden engineers with having to handle the constant re-teaming all the time, because engineers usually don’t like it — otherwise they would have moved into management already.

Oh, and finally there’s a nice slide deck from Eberhard Wolff why he’s tired of Conway’s law which you should really click through if you’ve read this far.


It's okay to be weak

Martin Fowler writes: “I should also mention that I suspect I’m not as energetic as I used to be as I age. I’ve long known that when you’re doing very creative work, such as writing or programming, the useful hours you can do in a day is rather less than the accepted industrial eight. I’ve always been nagged by my conviction that I’m not working as diligently or effectively as I ought to be. Sadly I’m not getting any better at not letting that bug me.”

I love how he openly admits his weakness.

Let me clarify that I don’t mind if his weakness is real or just subjective. I think it sets a good example if a public figure like Fowler, who’s written quite a number of highly influential books and given too many great talks to list, is openly sharing his inner feelings.

In today’s work environments, there is this illusion and pressure that you have to constantly perform on the very highest level at least eight hours a day. This is hurting so many people, up to the point where good folks drop out of their jobs due to burn-out and depression. So, it’s important to recognize that we are all human after all and that the performance we are capable of is not constant. What we can deliver at work is depending on so many factors and many of them are actually outside of our control. We just have to accept that.

If in our daily work life we can all be a bit more mindful that we are collaborating with other humans and not robots, if we are open to actively listen to what is going on in the life of the people around us, and be able to connect this to our own struggles with daily delivering what we think we should be capable of — maybe then the overall work experience will improve for all of us.


Make it so! Decision making in software architecture

I find it interesting in discussions with developers that many have a bad picture of the job of a software architect. When thinking back to my first encounters with a software architect, who was performing his document reviewing job in a dedicated architecture department, I can totally relate, but then again it is 2021 and things should have changed [1]. After all, most developers have a very keen interest that the architecture of the system they are working on is good, but still some developers do not seem to be aware how much their work is impacting the architecture (and vice versa). “Every interesting software-intensive system has an architecture. While some of these architectures are intentional, most appear to be accidental”, as stated in Grady Booch’s article on “Accidental Architecture”. Some people now seem to believe that accidental architecture is the new norm, as “the best architecture, requirements and design emerge from self-organizing teams” (from the principle in the Agile Manifesto). In contrast, in organizations which have dedicated software architects, some people seem to think that it is the architect’s responsibility to ensure that there is a well informed design and to make the hard decisions as necessary. These hard decisions would probably be about “the important stuff (whatever that is)” that is called out in Martin Fowler’s article “Who needs an architect”.

But what does that mean for the daily work, should developers send all questions to the architect for her to make decisions and should architects sit in their ivory tower pondering them and ordering decisions by importance? Not in my view of the world. I’m a big fan of the role model that Fowler describes as Architectus Oryzus or acts as primus inter pares in Georg Hohpe’s article “Agile and Architecture: Friend, not Foe” in which the architect is closely working with the developers and who tries to identify important topics and make sure they are getting addressed at the right time.

Sketch image showing "A Design thinking Workshop", by Jose Berengueres, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons.
A Design thinking Workshop
Image by Jose Berengueres, CC BY-SA 4.0, via Wikimedia Commons.

Stefan Toth has a fantastic (German only) book on agile processes for software archictecture Vorgehensmuster für Software Architektur which describes patterns how to make this happen in daily practice. The core point of the book is that architecture should be a shared responsibility. Everybody should try to raise and address important concerns — whether there is a dedicated architect role or not is a separate issue. Toth lists patterns like identifying quality scenarios, listing architecture topics in the backlog, ad-hoc architecture meetings, common decision making and testing for architectural concerns (and many more) many of which are very team centric. The book is so great because it shows that it is easy to make architecture work a natural aspect of the development workflow and not some high ceremony.

In my experience, technical discussions are often concerned with architectural topics, so you might not even need to bring people together for formal architecture meetings — nevertheless technical design review meetings can be beneficial if they involve people that can and do share new perspectives but who would otherwise not be working so closely with the team. Similarly, if your team already adds technical debt and other non-functional items into the backlog, you probably are already following two of the recommended practices.

The role of the architect is then not so much the one of a decision maker but more the one of the person taking responsibility that architecture aspects are considered by the team. In bigger organizations this implies that the architect should be working closely with the development team but not only. While ideally the developers are also closely collaborating with any other important stakeholder (product owner, devops / operations, other dev teams), it is quite vital for an architect to understand the different (and potentially conflicting) quality needs of these parties. Again, probably even better than having e.g. 1:1 meetings between architect and security officer, so that the architect can point out operational constraints to the development team, would be if the architect tries to make sure that a security expert is involved in design and requirement discussions.

Hippo with tongue stuck out (via Pixabay)
Highest paid person’s opinion.
Image taken from Pixabay

However, the job of an architect is not only consisting of organizing meetings with the right people. In the end, intentional architecture is in need of decisions. “Common decision making”, as listed above, means two things to me: first, people who are affected by a decision (e.g. the developer who has to implement the result) should be directly involved in the decision making process. In teams that do not have a dedicated architect this seems naturally to be the case, but it is worth pointing out that this can easily degrade to single programmer decisions. The other anti-pattern here would be that the architect, likely to be the highest paid person in the room, makes a HiPPO decision or specifies a design while at the same time not being close enough to the problem and / or the existing code base. [2] The effect here can be devastating if the decision turns out to be wrong: not only will people think that the architect did not listen to the team but also that the architect might not have the necessary competence. This can result in serious psychological safety and trust issues. A good working approach to avoid such problems is running small experiments (spikes) to validate options.

The second aspect is that a decision is made in the first place, which is even more important. The idea of decision in the last responsible moment notwithstanding, I find it surprising how hard it can be for organizations / a group of people to take decisions. It is important for an architect to be able to guide a group towards making a decision but also to be comfortable to make a decision themself, if the group cannot find an agreement. The biggest issue here are the trade-offs of potential solutions: e.g., one solution might have benefit A (e.g. scalability) while being weaker in B (e.g. ease of use) and another solution has opposite characteristics. Often different people will have different opinions whether A or B is more important in the selection process. It is very helpful for teams to come to an understanding (not agreement) about these different weighs that people give to the characteristics and then hopefully, the architect can explain why she thinks that one argument is more important than the other. Coming to a decision is more important than convincing everybody, though: aim for solutions that everybody in the team can agree with, even if they would still prefer a different one. Also, it is a good idea to document decisions in a light weight fashion (e.g. in a small Architecture Decision Record).

In case that during implementation the developers run into obstacles, it is quite likely that the developers will actually figure out a different solution than the one decided. This is another case where the “architect as lonely decision maker” can lead to a lot of unnecessary friction. In practice, the choice of a different approach is not a problem as long as it is handled in a similar responsible way, where the developers take a look at the implications and trade-offs of their choice and involve others (and, please, update that ADR).

Footnotes:

[1] I am obviously not talking about so-called Enterprise Architects who are mostly concerned with governing how many systems at a company interact to provide the most business value. Instead the focus here is on software architecture for a single system (which, of course, is likely to interact with many others).

[2] Sometimes the opposite scenario might be true: the team is consisting mostly of people which do not have enough expertise, whereas the architect has long experience with the code base. In this situation, the architect can be helping the team by sketching a design approach and sharing his knowledge. Still, there should be room for questions and further ideas from the team.


Angela Merkel, HiPPOs and decision making

So Angela Merkel yesterday withdrew the decision to add another non-workday before Easter (cf. deutschland.de video of Angela Merkel’s press conference), apologizing and taking full responsibility for the half-baked idea. Many people paid her respect for this and so do I. Some thoughts on this, though:

First of all, this half-baked idea was the result of a meeting going on for far too long. As some MP said, they first heard of the idea at 23:45 in the evening, after the discussions had been running for many hours. As most knowledge workers would know, a typical effect of working too hard for too long is that the quality of your outcomes is diminishing. Too often, reflecting back on the result of yesterday’s overnight session will reveal that your fabulous idea might actually be totally off.

Second, apparently no experts were around which would have an idea of what it would entail to make this additional day-off a reality. This looks a lot like a decision based on “HiPPO”, the Highest Paid Person’s Opinion. Generally speaking, results are better if experts prepare options with effort / benefit arguments. And usually, it’s better to postpone a decision to “do it this way” until you know that it actually can be done that way.

Last but not least I find it a pity that the behavior of Angela Merkel, to openly admit a mistake, is apparently exceptional and surprising to many people. Admitting mistakes and taking responsibility should be the norm for any leader and not the exception. As a leader, you want to understand the problems your team is facing and what mistakes people are making. How else could you improve your results? But if you are not leading by example and are not open about your mistakes, it is likely that your people might have the impression that it is only okay to talk about the great accomplishments. This in turn means that everybody is afraid to make any mistake, trying to reduce risks at all costs, which will kill any positive attitude and atmosphere. Good luck trying to achieve great results in a context like that.

tagesschau.de article on Angela Merkel’s press conference to withdraw the day off before Easter


Color is the key -- configuring my Dygma Raise keyboard

In 2020, I finally came to the point where I was ready to invest some money into my health. So I bought a standing desk and a new keyboard, a Dygma Raise, which is an ergonomic split keyboard with mechanical keys. The main reason to buy the Dygma instead of some other ergonomic or mechanical keyboard was that I wanted to be able to keep my arms straightly positioned when using the keyboard — with a regular keyboard I always need to move my arms a little bit inwardly in front of my chest, which leads to a body position where I’m not keeping my back straight which contributes to back pain. As you can see from the image below, I move the two parts considerably apart from each other, to the extent the rather short connection cables from the keyboard to the so-called Neuron allow it. Fortunately these are just USB-C to USB-C cables which means I can just buy two longer cables when I want to (didn’t get the round tuit yet).

Picture of my dygma raise

Configuring the Dygma Raise

The Dygma Raise is a highly configurable keyboard. Not only can you order various layouts (ANSI, ISO) or colors, you can also order different key switches and also change keys as you see fit — yes, I’m talking about the hardware keys here. The my raise page gives a nice overview. I ordered a German layout and Cherry Brown keys (cf. the nice switch overview), which gives a nice tactile feedback without being too loud. I did not fiddle with the keys themselves so far but would like to talk about the various key layers and their configuration via Bazecor, which works on Linux, Mac and Windows. I only tried to use Bazecor on Linux (Debian 10), where they provide an AppImage which so far has been working good (minor point: the user has to be in the dialout group, otherwise you need to start the application as root).

Initial impressions

My initial impression, coming from a pretty regular Cherry G86, was: “Great haptic feeling, but oh my, this will take time to get used to using it.” During my initial attempts to use the keyboard I recognized how often I apparently had been looking at the keyboard before: when I moved the two halves apart and not look on the keyboard, I would utterly mistype. I overcame this by really practizing to type blindly with the two parts put together, so that the typing was more akin to what I was used to. Over time, I got better at this and can now type well without having to look at the split halves.

The biggest other problem for me was that the default key layout has no cursor keys on Layer-0 at all, the cursors are on Layer 1 and keys W,A,S,D. To get to Layer-1, I could either use the transinient Shift to Layer-1 key (the left middle lower thumb key) or the persistent Lock to Layer-1 (the right middle lower thumb key). So, what was always a single keystroke was now requiring two key presses in a totally different area of the keyboard. Add to this the use of all sorts of modifiers that I’m constantly using, e.g. C-M-left to switch screens and now I have to press another key to switch the layer? Not nice. I similar missed Page-Up and Page-Down, which were not even configured for Layer-1. And then, what do I use instead of the media keys I used on the Cherry? That should be easy to configure, right.

Goals

It became very clear very quickly to me that I have to adjust my typing behavior. However, I use currently two machines in parallel and I only have one Dygma Raise. Obviously, I need to configure replacements for at least some of the keys that are missing in comparison to my old keyboard.

The other thought was that I use a lot of key combinations in various programs I use and here I have ten layers to configure, so surely I should be able to ease some of my typing? I.e., in Emacs instead of htting C-M-SPC C-w to kill the current S-expression, it would be nice to just hit a shorter key combination.

And finally, the Dygma Raise has colors like a rainbow, so …

  • keep layer 0 configuration close to a normal PC-105 layout
  • use layer 1 for missing keys, especially for the media play/stop key
  • use layer 1 and others for shortcuts / special needs
  • setup consistent and meaningful color usage

After some time, I settled to use three layers like this:

Layer 0

As stated in the goals, layer 0 is setup to be pretty normal, see below. E.g, I set all four “space” keys to yield space, so this is like a single big space key. Also, different from the pains that is described in Kari Martilla’s blog about his dygma raise, there are no changes to the parenthesis whatsoever, although they are equally problematically placed on the German layout as on the Nordic layout.

To address the missing cursors, I configured the lower row of thumb keys for the cursors which makes their use very easy, as I only need to hit a single key that is also very easy to reach. For switching virtual desktops in LXDE/Openbox, I use C-left or C-right and this is even easier now, as I can now move to right with Right-Control, too, where before my right hand would have to wander off to the cursor block.

Layer 0 configuration

One key difference is the Escape key: on a PC-105 with German layout that is where you would find the caret (^), but I decided to keep the Escape key bound to it. The caret is to be found on layer 1. And then there are the Dygma and the FN key(caps): they are bound to Shift-to layer 1 and 2, respectively.

Note also the color usage which clearly demarks group boundaries, e.g. between the space and alt keys.

Layer 1

Layer 1 mostly holds the keys which I need often but for which there is no room on Layer 0. I.e., the caret is on the Escape key and the function keys (which I actually only very rarely use) are on the number keys. For the remaining movement type keys, my line of thought went like this: Q and A go to Home and End, W and S to Page Up and Down, because that positioning resembles the key ordering of these keys on a regular PC104/105 layout. Delete and Insert, however, are needed more often and hence can be reached via the two lower left thumb keys.

Layer 1 configuration

I set the media keys (next, previous, louder, quieter) to N,P,+,-, respectively, which makes it easy to memoize. The start/stop toggle is bound to FN, so that (on layer 0) I can simply hit the dygma+FN combination.

You can also see the Menu, Led cycle and Move to Layer 1 are configured on this layer but I don’t use these keys much. I’m currently experimenting with putting the parenthesis additionally on layer-1 on K,L,Ö,Ä to see if that makes hitting the parenthesis [,],(,) that I need for Clojure programming most often better than on the regular keys, but I’m not sure yet how good that will work.

A lot of keys are configured to produce nothing, grouped in the default color of the layer, whereas the configured keys pick up the colors from layer 0 plus the dark blue for the media keys.

Layer 2

With Bazecor versions beyond 0.22, it is now also possible to configure macros, i.e. a sequence of keys. Initially that wasn’t working for me: you have to upgrade the firmware of the Dygma to enable it. That’s the main purpose I started to configure a layer 2 for. Currently I have configured only two such macros: Q (i.e. Fn-Q when I’m in layer 0) triggers a sequence of keys that, when pressed in a lisp mode in Emacs, will select the current S-expression and indent it. It is bound to Q as this is somewhat similar to hitting M-Q will do (formatting a paragraph). W will trigger a key sequence that in Emacs will select the current S-expression and kill (cut) it, similar to C-w.

After I’ve set this up, I thought about doing this with a regular Emacs keyboard macro or Elisp function instead and then invoke these with the keys. This would offer two benefits: I could assign this just to the modes there this combination makes sense and I have the nice side-effect that invoking these functions can be done independent of the Dygma being in use. As it is, if I now hit Fn-Q while not being in Emacs, the application in use will receive the defined key macro sequence and who knows what happens then. Then again, the entire point is to make use of the Dygma to shorten the amount of keys to be pressed.

I also set up ‘-’ to generate C-_, which will trigger an undo in Emacs, thereby on my German keyboard saving me one key stroke (Fn-<-> instead of C-Shift-<->).

Layer 2 configuration

I also configured the Dygma key to toggle media start/stop, so that it doesn’t matter if I shifted to Layer 1 or 2. And, finally, there is an experimental alternative configuration of some movement keys (Home, End, PgUp/Down) on the thumb keys.

Conclusion

The Dygma Raise is an expensive keyboard, no doubt. Did it help with my back pain? Not so much, unfortunately. But it is an amazing keyboard that allows to be configured in lots of ways. Would I recommend it? I surely would. Oh, and did I mention that it comes with a case to be carried around? Ah, keyboard nerds galore.


Who Am I and If So How Many? Using multiple Firefox profiles.

So, I’ve changed my Firefox setup quite a bit over the last months. I now work with multiple profiles, e.g. one for online banking, one for social media use and one for development purposes and one for “default” use (i.e. pretty much else). I’m quite happy with the experience.

Usually I have all three profiles running in parallel, showing on different virtual desktops of my linux box. I.e., on desktop 2, I usually have my development environment open, so I’ll open the “development” browser there, desktop 4 runs all the applications for interacting with other people (mail etc.), so “social” goes there and the default one will show on virtual desktop 5.

Why would you do this?

One direct effect is that the amount of open tabs in any given browser window is a lot smaller and also better grouped than before. I.e., any web page I need for my current development project I will only open in my “development” profile. I will only open my bank’s page in the banking profile, so this browser window will never show anything else.

The other main benefit is that I can have profile specific configurations. E.g., I nail down my “default” profile with NoScript which is not really useful on my “development” profile, whereas I don’t need e.g. the React Dev tools on the “social” or “default” profile.

Dedicated profiles can also help with security, e.g., using a dedicated profile can lower the attack surface for online banking: When you don’t browse to other sites with the same browser/profile, any XSS/CSRF issue on these “other” sites for sure can’t affect your online banking connection.

The profile I use exclusively for online banking is also highly locked down and in addition uses a different theme, so that it is visually obvious that I’m working with this profile.

Get me started, please

To start using multiple profiles, you have to run Firefox with the -P switch, which will start the Profile Manager that allows you to create new profiles, delete profiles etc. Alternatively, if you have Firefox already running, browsing to about:profiles will also allow you to manage your profiles.

For a while, I just started the non-default browsers manually over the command line, by just opening an xterm and running firefox --no-remote -P social &. But I finally created some additional local .desktop files (cf. the Arch wiki page on xdg desktop files), so I can start Firefox from the desktop. I.e., I added a file $HOME/.local/share/applications/socialbrowser-usercreated.desktop with the following content:

[Desktop Entry]
Name=Social Firefox
Comment=Browse the World Wide Web
Comment[de]=Im Internet surfen
Exec=/usr/lib/firefox-esr/firefox-esr --no-remote -P social %u
Terminal=false
X-MultipleArgs=false
Type=Application
Icon=firefox-esr
Categories=Network;WebBrowser;
MimeType=text/html;text/xml;application/xhtml+xml;application/xml;application/vnd.mozilla.xul+xml;application/rss+xml;application/rdf+xml;image/gif;image/jpeg;image/png;x-scheme-handler/http;x-scheme-handler/https;
StartupWMClass=Firefox-social
StartupNotify=true

This will then create a menu entry in the “Internet” submenu in my application starter menu in my desktop environment (because that’s where the given categories will create entries).

Any drawbacks?

One annoying thing is the profile selection that Firefox pops up when you don’t specify a profile on the command line. If you select the option “Use the selected profile without asking at startup”, then it will not be easy anymore to use a different profile — the only way then is really to use -P again.

This default profile selection becomes a problem especially when you want to open a link from a different application (eg. from your mail program), because then you can’t decide which of the running browsers/profiles will open the link, it will always try to use the default one. I have seen varying behavior what Firefox does when you don’t select a default profile: sometimes it just picks one running profile successfully, but I’ve also seen it opening the profile selection dialog again. In that case, if you select a profile that is already in use, Firefox will handle it like an attempt to open up the profile a second time, resulting in an error. My current workaround to this particular issue is to set the default browser to Chromium via xdg-settings set default-web-browser chromium.desktop.

The other hassle working with multiple profiles is bookmark management, as I want some bookmarks only local to one profile but most should be shared. I can use Pocket for the shared ones, of course. However, I often just copy&paste the URL manually to the “default” browser which serves as the main bookmark keeper. I really should move away from this completely and instead use the bookmark extension of my Nextcloud installation.

Overall, for me the benefits clearly outweigh any drawbacks.


File download with ClojureScript

As I couldn’t find a recipe on how to provide some data from a ClojureScript application for download, here’s how. If you know how to do this in JavaScript already and if you’ve done any CLJS-JavaScript interop, there’s nothing new for you to learn here, as this is a pretty straight-forward translation of how to use the Blob API and clicking a temporary link in JavaScript,

(defn file-blob [datamap mimetype]
  (js/Blob. [(with-out-str (pp/pprint datamap))] {"type" mimetype}))

(defn link-for-blob [blob filename]
  (doto (.createElement js/document "a")
    (set! -download filename)
    (set! -href (.createObjectURL js/URL blob))))

(defn click-and-remove-link [link]
  (let [click-remove-callback
    (fn []
      (.dispatchEvent link (js/MouseEvent. "click"))
      (.removeChild (.-body js/document) link))]
    (.requestAnimationFrame js/window click-remove-callback)))

(defn add-link [link]
  (.appendChild (.-body js/document) link))

(defn download-data [data filename mimetype]
  (-> data
       (file-blob mimetype)
       (link-for-blob filename)
       add-link
       click-and-remove-link))

(defn export-data []
  (download-data (:data @some-state) "exported-data.txt" "text/plain"))

I’ve tried to break it down into pretty self-explaining pieces, but here is a bit of explanation: export-data would be used as an on-click handler on some UI element and would expect to gather the data to be exported in some way. Here, we’re just assuming the data is already stored in some state-atom and is a map. file-blob is pretty-printing the data somewhat, declaring the content to the given MIME type (e.g. text/plain) and then returning the newly created blob. Of course, you might want to change the pretty printing or the MIME type, depending on your data.

We’re not doing anything fancy with the created Blob object, we simply hand it over to URL.createObjectURL whose result we use as the href attribute of a newly created anchor element. Setting the download attribute will tell the browser not to navigate to the URL. After this we simply add this new link to the DOM and then execute the download by dispatching a MouseEvent on said link. This actually triggers the download, i.e. the browser will open up a file save dialog with the suggested name, so the only thing left to do is to clean up the link from the DOM.


Stand and deliver -- how can a team become quicker?

Recently I met with an old colleague of mine and we talked about teams, among other things. He asked me what I would do to urge on a team to become quicker and to deliver more. My gut reaction was that this is a thin line to walk. The following delves a bit deeper into the issue.

What not to do

One particular pitfall is to ask the team to deliver more story points. I guess the motivation behind the idea is two-fold:

  1. Delivered Story Points are easy to measure.
  2. The team could work harder and then easily deliver more.

One of those assumptions is generally speaking false and it’s not the measuring point. What I usually see is not people slacking off but people struggling.

But it’s not only that this core assumption is wrong, it gets worse when you consider the likely effects of asking for more story points to deliver. First of all, note that it’s very easy for the team to deliver more story points: The team can simply raise their estimates for each story. They can raise the estimates basically on each estimation, thereby gaming the system and destroying any trust you can have in the estimates. This essentially means you’re loosing the velocity as an indicator what the team can reliably deliver in the future.

The second effect is even worse: If people feel pressure to move more stories to “done”, people will concentrate on delivering the core value — this being the functionality that the client / product owner was asking for. On the flip side, they will stop caring about quality because that is apparently not what is the main driver. As a result, people might stop doing the following things, because they actually cost time, time they can’t spend on finishing the next story:

  • testing thoroughly
  • covering the changes with automated tests
  • cleaning up the legacy code
  • refactoring your first “working” implementation into a maintainable shape

The biggest problem with this is that the effects of all this will usually not affect the current sprint, but will probably cause severe slow down later on, either because of bugs that need to be fixed later or because all the technical debt will make any further changes so much more complicated. Let me repeat this: if you don’t care deeply about the quality of what you build, the effects will slow you down, maybe not today, but certainly tomorrow.

Don’t ask for more points, instead drill deeper to figure out what is holding back the team and focus on fixing that.

Why the team can’t move faster and what to do about it

My friend then came up with the idea that it might be viable to look beyond the usual Scrum process, maybe consider Kanban or something else. My perspective (and reaction to him) is that he was on the wrong track. Yes, Scrum does add some rituals that some people consider to be unnecessary baggage that don’t add value. In my experience that’s either because the team in question is actually already a highly experienced team that delivers very successfully or because the process parts are executed so poorly that they don’t help the team. But usually the problems are elsewhere and simply changing the framework you’re using will not fix those.

Process problems

Let’s be clear: yes, in some cases you actually have a problem in the way work flows through your system. One classical scenario is that all testing work is only started at the end of a sprint, when all the implementation is done or you have too many people working on items in parallel, not being able to finish anything early. You are basically setting yourself up to find bugs too late to be fixed and very often also to have left-over testing-only work that needs to be completed in the next sprint.

If you’re using any method to inspect and adapt (e.g. regular retrospectives), this is one of the things you would expect the team to be able to identify themselves and also to take action on. Some guidance from an experienced Scrum Master can be really valuable here, she could for instance use an activity to identify the value stream through the team. Also, ideas from various frameworks can help, e.g. setting a work-in-progress limit as done in Kanban.

The other classical case of process problems is where the organization is putting up several hurdles in front of the development teams without even recognizing it. This could range from forced use of ineffective tools or low-level machines over tons of bureaucracy to using outdated micro-management practices. These process problems are far from easy to tackle and require usually a lot of help from somebody outside of the development team (e.g. the Scrum Master or some manager).

So, yes, sometimes your process is a problem, but it’s unlikely that it’s a question of Scrum or Kanban (or something else).

Worthless work doesn’t motivate

If your team doesn’t seem to have high spirit to tackle the next job, it’s probably not related to your organization. If your product is really worthless, no one will be motivated to put in any effort. I have a good friend who was literally asked to add piles of sh*t to a game his employer produced. This was a rather short lasting work relationship. I guess such an experience is the exception but nevertheless in some cases people see no value in what they produce.

Let’s assume you team is working on a genuinely useful product. This does not mean that it’s obvious to everybody that the next version or feature serves a useful purpose, too. Sometimes the people which know best why delivering a product to the customer is important are the same which don’t understand the need to explain this to the development team. But this is vital for motivating the good people that are building the product in the first place. Ignoring this often goes along with a mindset of seeing the development team as a feature factory, a black box into which you have to feed specifications and which will spit out the next version of the product on the other side.

The problem is that on the inside of the factory the value of the work remains in the dark, of course. Either feature that needs to be built is just another feature, all different but also all the same. As a result, team members will get the impression that it’s actually not all that important what exactly they are building, outside of meeting the deadline, of course. Why there is a deadline in the first place and what happens if it’s missed or whether the actual customer problem could be solved better with an entirely different solution, is a problem of other people “above my pay grade”. Of course, if the team doesn’t understand the value of what they’re building, how could they care about delivering something with better quality or with higher efficiency?

I’m convinced that the root cause of this issue is that there is no collaboration between the product managers and the development team. Try to live up to the idea that “Business people and developers must work together daily throughout the project”. It’s hard to just throw requirements over the fence without any further explanations, if the guys can ask all sorts of questions every day in the stand-up. Day to day collaboration and just spending a lot of time together (aka “socializing”) will also help with the insight that the developer team consists just of fellow humans with a brain and a genuine interest that their work should matter.

The other answer to this wide-spread problem is to make sure that at the start of a project (release, product increment, sprint etc.) the person responsible for the success and the money explains the vision and the goal that needs to be met. If you can back up this story with customer input or other market research, the better. When explaining and discussing things (the what), keep referring back to the goal and the vision (the why). This will lead to better discussions, better solutions and almost certainly to higher motivation among the team.

Technical debt

“If your code is crap, stickies on the wall won’t help” — Rich Rogers

Stickies on the wall The truth of this quote can’t be overstated in my experience. The more technical debt you accumulate, the slower your team will become as it struggles to understand all that crap. Also, code that is hard to understand will make it easy to introduce new bugs and each new bug means that your team will not work on new features. Bad code will also demotivate your people as they feel like Don Quixote fighting an endless battle against windmills that will constantly throw more bad code into their faces. There is also the broken-window syndrome: if people get the feeling that everything looks awful and is broken already, they can get the feeling that it is alright to put in even more spaghetti code.

“Continuous attention to technical excellence and good design enhances agility” — Agile manifesto

I think it’s a shame that the focus of agile has shifted from methods like XP to frameworks like Scrum or SAFe which don’t have the same emphasis on technical excellence. Don’t get me wrong, I know that a lot of the techniques which were promoted in XP like TDD are now considered state of the art for software development, regardless of the process for work organization you are using. But the thing is that it’s too easy (for people like the Scrum Master) to concentrate on getting stuff “done” and other core pieces of such frameworks and to let the technical excellence slip completely out of focus.

The right action to take is to make people constantly aware that good technical solutions which are maintainable will make life (or at least work) so much easier and nicer. Use a Definition of Done which is focusing on quality and maintainable code. Establish pair programming and code reviews. Make sure you get technical debt items into the backlog and that they are also prioritized appropriately.

Note that this is not about “gold plating” and spending endless hours on polishing code. But honor the Boy Scout rule that you should leave any place you visit a little bit cleaner than before.

Lack of expertise

One related problem is that you might have people in the team which don’t have all the necessary expertise that would allow them to work “faster” or to find high quality solutions. This usually comes in two flavors:

  • lack of general development know-how (e.g. SQL, programming language, design patterns etc.)
  • understanding of the existing code basis

Both are usually problems which will become smaller over time because your beginners will gain more experience in the fundamental technology and also in the code basis on which they are working. Be aware, however, that it’s easy to reach a level where people have learned enough to be dangerous. That’s difficult because they can both produce something that “works” but is not of the required quality. It’s easy to stay on this low-level plateau of understanding when there is no mentoring / coaching beyond this level.

Clueless Cartoon Man Looking At Different TimesThe key idea to solve this problem is to actively invest into the expertise of your team members. One idea is to have dedicated training sessions addressing the required know-how areas. However, getting a theoretical introduction into a topic is usually not enough and should only be the starting point to get hands-on experience. So, I think it is crucial that the overall team understands that the more experienced people can fundamentally help the less experienced people while doing the real work — by patiently answering questions, explaining concepts and reasons, by pair programming and general technical coaching. Note that this isn’t a one-way road (from senior to junior) either: this becomes a lot easier if the inexperienced people are confident that it’s okay to ask for help and to admit failures without having to fear retaliation — which brings us to the next point.

Team conflicts

The agile manifesto prefers “Individuals and interactions over processes and tools”, but it’s so much more easy to focus only on the latter. Especially technical people are often too happy to ignore people problems (maybe that’s why they chose to go into tech in the first place). Team conflicts can come in various different flavors, e.g. some people might not want to work together, maybe there are blame games being played or there are bad rumors being spread behind people’s back. You don’t have to be able to recite the proverbial Five dysfunctions of a team to understand why deep conflicts within a team will be a fundamental blocker on the way to more effective and efficient results. Emotions will get in the way of any discussion, technical or other.

While it is maybe obvious to have this insight into the damaging effects, actually figuring out that there is such a conflict in a team might not be so easy if there is no open fight. Fighting openly is very often avoided by all involved parties, in the name of so-called professionalism. Even if it becomes clear that there is a conflict (or multiple), the team might be very reluctant to actually discuss or resolve it. This is directly related to the conflict, of course, because people need at least some feeling of psychological safety (and ideally also trust) to be able to address problems openly.

I think it’s rare that deep conflicts within a team are resolved without involvement from the outside, regardless of whether the external person is a manager or just another colleague. This takes a lot of patience and even more empathy. An important ingredient here is to make sure that the emotional aspect of the conflict are actually coming to light.

Ironically, the prime directive for retrospectives that is intended to remind people to be respectful can sometimes be a hindrance to openly discuss the conflict. People can misunderstand “believe everybody did the best job they could” that no criticism should be voiced at all, where it’s supposed to enable trust to speak up openly about problems. Activities like Spot the elephant can help here. In my opinion, it is okay when this results in hard discussions and hurt feelings and when emotions become apparent. However, it is the responsibility of the moderator to ensure that these emotions don’t get in the way of a discussion of solutions. Ideally, the involved people should be the ones which come up with the solution. Unfortunately, in the case of deeper personal conflicts, often the involved people have no idea how to separate factual discussions from emotional aspects or have no experience how to give or react to feedback without deepening the conflict. Moderation from some experienced “outsider” is key here. Of course, in some cases, the best resolution might be to change the setup of the team, but this amounts to losing the game.

Keep it simple, stupid

In my experience, the main problem in software development is rarely the technology or the main process, but almost always people and their interactions. Note that also technical debt is coming either from inexperience or from people not collaborating enough to point out the rising quality problems. So, if you think the process is the problem, I would recommend to take a deeper look.

ObTitle: Adam & the Ants, “Stand and Deliver” from “Prince Charming”


Threadripping under Debian

Getting a Threadripper machine to work under Debian

After a long long time of more than eight years my old machine started showing hardware problems: first the power supply failed and had to be replaced. Next the CMOS battery died. It became clear that finally a replacement would be in order. When the first reports about AMDs Ryzen family came out and in particular with the Threadripper tests, I became interested. In the end, I waited until the Threadripper 2950X was available before ordering a custom-built machine from a local vendor. Here is the basic hardware setup:

  • CPU: Threadripper 2950X (32 cores)
  • Mainboard: MSI 399X Carbon
  • Memory: 32 GB
  • GPU: Nvidia 1060
  • SSD: Samsung 970 Evo NVM 1TB

This post will describe what I had to do to on the software / linux side to bring the machine to a usable state.

Basic installation

As a Debian user of old and knowing that the latest Debian stable (Stretch) would not provide a recent enough kernel, I installed Debian Buster. This went more or less flawlessly. However, I can’t help but note that the installation of lvm and disk encryption (ie. lvm2 with lukssetup) provided by the Debian installer is really bad. It hasn’t improved one bit since ages: the automatic partitioning will come up with a crazy sizing scheme and there is absolutely zero useful support when you go manual. It’s easy to end up with an installation which looks like it succeeded but rebooting will end up on the rescue console because mounting the root volume group fails.

The only other noteworthy thing is that I directly installed the proprietary Nvidia drivers because the free nouveau driver doesn’t really support any of the “newer” and more advanced 10X0 cards from Nvidia. As I am a LXDE user, that was of course the desktop environment I installed. However, I installed LXQt and KDE along with it, to also learn about their current state, but went quickly back to LXDE — LXQt is supposed to be the successor of LXDE but it’s currently not really in a comparable state.

I also installed Gnome briefly, only to discover that it’s trying to use Wayland which apparently has problems with the Nvidia drivers. I couldn’t make it work and as I was never a big Gnome fan anyway, I simply threw out Gnome again.

system-udev crashes and kernel compilation

With the system up and running, the first problem I noted was system-udev constantly crashing. This also prevented suspend/hibernate from working. A longer internet search finally revealed this system-udev bug report in Red Hat’s bugzilla. Apparently “the BIOS/firmware is advertising it supports SEV, when in fact it doesn’t” where SEV is AMD’s secure encrypted virtualization technology. If the kernel config option CONFIG_CRYPTO_DEV_SP_PSP is set to yes then the kernel would use it and apparently most distribution provided kernels are setting it — the one from Debian Buster (4.18.0-2 at the time) surely does.

There are apparently three ways to fix this issue:

  • wait until the BIOS/firmware is fixed to no longer provide the wrong information,
  • wait for a newer kernel to provide a workaround,
  • or compile the current kernel with CONFIG_CRYPTO_DEV_SP_PSP set to no.

Apparently, since 4.19-rc5 the kernel has a workaround for the issue, but at the time of writing Buster and Sid only have 4.18. So, off to compile a new kernel without that setting. This turned out to have become a lot more complicated since the last time I had to do this on a Debian system (which was still using make-kpkg at the time, so yes, that’s several years I didn’t had the need to compile a kernel). It also doesn’t help that quite a bit of documentation out there is outdated — the Debian Kernel handbook seems to be the proper documentation.

Unfortunately, it’s still easy to get it wrong. I ran into issues with certificates (why exactly you would need certificates to compile a kernel is beyond me, my wild guess is it’s related to “secure” boot) or my changes to the configuration were overwritten during the process and other issues. In the end, the following process “works for me”:

    # cf. chapter  4.2.3, generate the setup for amd64
    make -f debian/rules.gen setup_amd64_none_amd64

    # fire up the config dialog, now enable RCU-BOOST and disable PSP
    make -C debian/build/build_amd64_none_amd64 xconfig

    # build the kernel
    make -f debian/rules.gen binary-arch_amd64_none_amd64

The only problem with this is that the generated kernel will have the same numbering as the official package. So, a newer minor version of the Debian package can overwrite your manually build package. I haven’t really looked into this yet, but will update this section once I do (and yes, the process described in section 4.3 of said manual didn’t work for me).

Btw., don’t expect kernel compilation to be a fast process. Apparently, the kernel configuration that is used out-of-box for Debian compiles half the world and consequently this takes ages even on such a high-end machine.

The newly compiled kernel fixes the systemd-udev crashes and then also suspend / hibernate worked. However, when I triggered hibernate from the LXDE logout dialog, the machine wouldn’t power-off. Another long round of searching the web, including reading the source to lxsession-logout.c, revealed the solution to disable upower via systemctl disable upower.

Random hangs

During my first attempt to compile a kernel, my system just hang. Completely, not even the magic sysreq worked anymore. That’s apparently another known issue, but there are no good solutions. The attempt to set RCU_NOCB_CPU und RCU_BOOST is one such attempt (cf. the soft hang discussion in the AMD forum, but this didn’t really help me (cf. also the random soft lock discussion on the kernel bugtracker). However, the also linked ZenStates github repo contains a Python script which disables the C6 state.

Again, the forum suggests that the issue might be fixed with newer BIOS versions, but my mainboard has the mentioned AGESA version and the issue still occurred. Disabling C6 state per the script fixes the problem, but results in higher energy consumption and is hence not exactly a perfect solution either. If you want to run zenstates automatically, there is also a systemd template for zenstate. Note that this needs some additional tweaking (which I didn’t get around to yet) to run modprobe msr.

PCI errors

One other thing I noted in the logs were re-occuring PCI errors. There are a number of suggested fixes, cf. this PCI error answer on askubuntu to use pci=nomsi or pci=noaer or this other PCI error suggestion on unixstackexchange to use pci=nommconf. For me, pci=noaer hides the errors successfully and for the moment that’s good enough (read: I haven’t investigated whether the other suggestions would actually fix the issue also).

Virtualization

The latest thing I ran into was that Virtualbox was refusing to start a virtual machine, claiming the AMD feature would be disabled in the BIOS. This turned out to be the case, SVM was disabled. Actually, I couldn’t easily find the option in the first place, I had to use the search in the Bios.

Two things I haven’t tried out yet

The two things I haven’t used yet are the goodies the motherboard provides over the one I originally wanted: Bluetooth and Wlan. I did install the intel firmware to support both features, but can’t really say anything more about it yet.

Conclusion

I don’t have a conclusion just yet. It’s clear this new machine is a lot faster than my old box, but given the Core2Duo cpu is also quite long in the teeth now and that the old one didn’t had a SSD, that’s to be expected. Although this article has gotten longer then I would have hoped for, overall the machine is running quite well (with me running a testing version of Debian which is usually really badly supported). In particular, I can’t say I ran into any issues with the peripherals so far. I guess the machine will have to show it’s power over the next months and hopefully there are some more fixes to BIOS/firmware and the kernel as needed.


Page 1 of 6, totaling 53 entries