Shit Driven Development

May 03, 2016

TL;DR Software developer should cover each fault case with appropriate error handler, even if the case is impossible. Because even impossible case works sometimes. Because shit happens.

Let’s see an example.

I develop an authentication system based on JSON Web Tokens. There are following steps.

Client application sends device information to backend.
Backend saves the information into database and issues a device token.
Client sends the token with user credentials on login.
Backend decodes the token and binds device to the user.

There is an impossible scenario. Backend successfully decodes device token, but cannot find device information in the database. This scenario is impossible, because the token issued only after successful database write, moreover it contains a device ID generated by the database. And client cannot forge the token (theoretically) because it doesn’t have a cryptographic key. So the case is impossible, and I don’t have to cover it by special error handler. Correct? No.

Let’s see what could happen here.

Backend saves a device and issues a token.
Database corrupts.
Administrator rollbacks the database to a previous snapshot, that doesn’t contain the device information.
Client owns the valid device token, but the device information doesn’t exist in the database. The impossible case works!

If I don’t cover the case by an error handler, the backend will return a vague “500 Internal Server Error.” But it isn’t the server error, it’s the client error, because the client sends invalid token. And the backend must inform it by an appropriate error code. So the client will be able to throw away invalid token and reregister the device, instead of showing useless error message.

Therefore, adding error handlers for impossible fault cases increases sustainability of the system.

You can say: “Well, all correct, but it happens so rare. Why do we need to care about it? These efforts will never pay off.” And you won’t be right. It happens much more frequently than you expect. People are optimists. We suck at estimating risks. Every time we think about something bad, we think it won’t happen, at least with us. A lot of people dies every day of lung cancer and atherosclerosis. But a lot of people keeps on smoking and eating fast-food. They are optimists. Every day we stumble with poorly developed software, and keep on develop fragile systems, because we’re optimists too. We all think that shit won’t happen with us, it could happen with someone else. But it isn’t true. The true is that shit will definitely happen with us.

Here is my top three.

Firefox had been updated and switched off incompatible extensions. LastPass was one of them. It had happened when I had to pay a bill. So I wasn’t able to login into online bank, because of browser update! Needless to say I don’t use Firefox anymore.
Trello lost connection to its server and had silently lost the changes I made on a board. Communication between my teammates was broken.
Twitter CDN was dead, and my browser wasn’t able to load JavaScript. Thus I wasn’t able to write a tweet for two days. Nobody got harmed, but it wasn’t good anyway.

Devil in the details. You program could work well in the ideal conditions, but remember that there are no ones. So, next time, when you develop software, please, switch you brain in paranoid mode. It will help your system to be robust and the world to be better.

P.S. Hey, look mom, I’ve invented a cool buzzword!