2016

TL;DR Software developer should cover each fault case with appropriate error handler, even if the case is impossible. Because even impossible case works sometimes. Because shit happens.

Let’s see an example.

I develop an authentication system based on JSON Web Tokens. There are following steps.

  1. Client application sends device information to backend.
  2. Backend saves the information into database and issues a device token.
  3. Client sends the token with user credentials on login.
  4. Backend decodes the token and binds device to the user.

There is an impossible scenario. Backend successfully decodes device token, but cannot find device information in the database. This scenario is impossible, because the token issued only after successful database write, moreover it contains a device ID generated by the database. And client cannot forge the token (theoretically) because it doesn’t have a cryptographic key. So the case is impossible, and I don’t have to cover it by special error handler. Correct? No.

Let’s see what could happen here.

  1. Backend saves a device and issues a token.
  2. Database corrupts.
  3. Administrator rollbacks the database to a previous snapshot, that doesn’t contain the device information.
  4. Client owns the valid device token, but the device information doesn’t exist in the database. The impossible case works!

If I don’t cover the case by an error handler, the backend will return a vague “500 Internal Server Error.” But it isn’t the server error, it’s the client error, because the client sends invalid token. And the backend must inform it by an appropriate error code. So the client will be able to throw away invalid token and reregister the device, instead of showing useless error message.

Therefore, adding error handlers for impossible fault cases increases sustainability of the system.

You can say: “Well, all correct, but it happens so rare. Why do we need to care about it? These efforts will never pay off.” And you won’t be right. It happens much more frequently than you expect. People are optimists. We suck at estimating risks. Every time we think about something bad, we think it won’t happen, at least with us. A lot of people dies every day of lung cancer and atherosclerosis. But a lot of people keeps on smoking and eating fast-food. They are optimists. Every day we stumble with poorly developed software, and keep on develop fragile systems, because we’re optimists too. We all think that shit won’t happen with us, it could happen with someone else. But it isn’t true. The true is that shit will definitely happen with us.

Here is my top three.

  1. Firefox had been updated and switched off incompatible extensions. LastPass was one of them. It had happened when I had to pay a bill. So I wasn’t able to login into online bank, because of browser update! Needless to say I don’t use Firefox anymore.
  2. Trello lost connection to its server and had silently lost the changes I made on a board. Communication between my teammates was broken.
  3. Twitter CDN was dead, and my browser wasn’t able to load JavaScript. Thus I wasn’t able to write a tweet for two days. Nobody got harmed, but it wasn’t good anyway.

Devil in the details. You program could work well in the ideal conditions, but remember that there are no ones. So, next time, when you develop software, please, switch you brain in paranoid mode. It will help your system to be robust and the world to be better.

P.S. Hey, look mom, I’ve invented a cool buzzword!

I have been asked to interview Python programmers for our team recently. And I gave them a task—implement dictionary-like structure Tree with the following features:

>>> t = Tree()
>>> t['a.x'] = 1
>>> t['a.y'] = 2
>>> t['b']['x'] = 3
>>> t['b']['y'] = 4
>>> t == {'a.x': 1, 'a.y': 2, 'b.x': 3, 'b.y': 4}
True
>>> t['a'] == {'x': 1, 'y': 2}
True
>>> list(t.keys())
['a.x', 'a.y', 'b.x', 'b.y']
>>> list(t['a'].keys())
['x', 'y']

“It’s quite simple task,” you may think at a glance. But it isn’t, in fact it’s tricky as hell. Any implementation has its own trade-offs and you can never claim that one implementation better another—it depends on context. There is also a lot of corner cases that have to be covered with tests. So I expected to discuss such tricks and trade-offs on the interview. I think, it is the best way to learn about interviewee problem solving skills.

However, there is one line of code that gives away bad solution.

class Tree(dict):

Inheritance from built-in dict type. Let’s see why you shouldn’t do that and what you should do instead.

Python dictionary interface has number of methods that seems to use one another. For example, reading methods:

>>> d = {'x': 1}
>>> d['x']
1
>>> d.get('x')
1
>>> d['y']          # ``__getitem__`` raises KeyError for undefined keys
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'y'
>>> d.get('y')      # whereas ``get`` returns None
>>> d.get('y', 2)   # or default value passed as second argument
2

So you can expect that dict.get() method is implemented like this:

def get(self, key, default=None):
    try:
        return self[key]
    except KeyError:
        return default

And you can also expect that overriding dict.__getitem__() behavior you will override dict.get() behavior too. But it doesn’t work this way:

>>> class GhostDict(dict):
...     def __getitem__(self, key):
...         if key == 'ghost':
...             return 'Boo!'
...         return super().__getitem__(key)
...
>>> d = GhostDict()
>>> d['ghost']
'Boo!'
>>> d.get('ghost')  # returns None
>>>

It happens, because Python built-in dict is implemented on C and its methods are independent of one another. It is done for performance, I guess.

So what you really need is Mapping (read-only) or MutableMapping abstract base classes from collections.abc module. The classes provide full dictionary interface based on a handful of abstract methods you have to override and they work as expected.

>>> from collections.abc import Mapping
>>> class GhostDict(Mapping):
...     def __init__(self, *args, **kw):
...         self._storage = dict(*args, **kw)
...     def __getitem__(self, key):
...         if key == 'ghost':
...             return 'Boo!'
...         return self._storage[key]
...     def __iter__(self):
...         return iter(self._storage)    # ``ghost`` is invisible
...     def __len__(self):
...         return len(self._storage)
...
>>> d = GhostDict(x=1, y=2)
>>> d['ghost']
'Boo!'
>>> d.get('ghost')
'Boo!'
>>> d['x']
1
>>> list(d.keys())
['y', 'x']
>>> list(d.values())
[1, 2]
>>> len(d)
2

Type checking also works as expected:

>>> isinstance(GhostDict(), Mapping)
True
>>> isinstance(dict(), Mapping)
True

P.S. You can see my own implementation of the task in the sources of ConfigTree package. As I said above, it isn’t perfect, it’s just good enough for the context it is used in. And its tests... well, I have no idea what happens there now. I just don’t touch them.

The most miserable being in the world is a lost dog. Number two is a programmer who got legacy. But not the legacy of rich childless dead uncle you never know. No, I mean legacy code of the guy who worked before you. You are smart, you use agile, test driven development, continuous integration and other cool things... But it doesn’t matter anymore. Because the guy preferred to apply hot fixes on production using Vim and SSH. The repository keeps outdated broken code. The live code accidentally crashes, but nothing useful could be found in the logs. And you have to deal with it. My condolences, you got legacy.

And the most frustrating thing is that you cannot start from scratch, because the product is alive, too much time and money had been invested, you know, you are smart, fix it, please. And then you get paralysis. You have to do something, but you cannot force yourself to start coding. You take a cup of coffee, check your inbox, check for updates on Redit, Facebook, Twitter, then check your inbox again, then another cup of coffee, then lunchtime, then updates... and that never ends. But what the heck? You can write code all day long, and all night long. You love it. What happens here? Why you cannot just start?

The answer is chaos. You don’t know what to do. And you must have a plan.

  1. Explain to your family, that the bad mood you got is not because of them. It’s really important.
  2. Explain to your customer, that you are going to fix stuff, but some working things might accidentally be broken. It’s really important too. The paralysis you got includes the fear to break something, because of lack of understanding how things work together.
  3. Create new repository and place the code from production into it. You really don’t need to find out, which hot fixes had been applied on the live code and why code in the current repository is outdated and broken. Just throw it away.
  4. Make working development environment. It shall give you some inside of how the code works and will help to eliminate the fear.
  5. Make test plan. Some day you will make automated tests. But for now, simple checklist would be enough. It will help you to control process. More control—less fear.
  6. Setup staging environment and continuous integration. No comments, you must have it.

At the moment you make these six steps, you will have understanding of what you have to do and how you have to do it. The paralysis will go away, and the confidence will get back. Go ahead, and may the force be with you.