Don’t inherit Python built-in dict type

March 23, 2016

I have been asked to interview Python programmers for our team recently. And I gave them a task—implement dictionary-like structure Tree with the following features:

>>> t = Tree()
>>> t['a.x'] = 1
>>> t['a.y'] = 2
>>> t['b']['x'] = 3
>>> t['b']['y'] = 4
>>> t == {'a.x': 1, 'a.y': 2, 'b.x': 3, 'b.y': 4}
True
>>> t['a'] == {'x': 1, 'y': 2}
True
>>> list(t.keys())
['a.x', 'a.y', 'b.x', 'b.y']
>>> list(t['a'].keys())
['x', 'y']

“It’s quite simple task,” you may think at a glance. But it isn’t, in fact it’s tricky as hell. Any implementation has its own trade-offs and you can never claim that one implementation better another—it depends on context. There is also a lot of corner cases that have to be covered with tests. So I expected to discuss such tricks and trade-offs on the interview. I think, it is the best way to learn about interviewee problem solving skills.

However, there is one line of code that gives away bad solution.

class Tree(dict):

Inheritance from built-in dict type. Let’s see why you shouldn’t do that and what you should do instead.

Python dictionary interface has number of methods that seems to use one another. For example, reading methods:

>>> d = {'x': 1}
>>> d['x']
1
>>> d.get('x')
1
>>> d['y']          # ``__getitem__`` raises KeyError for undefined keys
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'y'
>>> d.get('y')      # whereas ``get`` returns None
>>> d.get('y', 2)   # or default value passed as second argument
2

So you can expect that dict.get() method is implemented like this:

def get(self, key, default=None):
    try:
        return self[key]
    except KeyError:
        return default

And you can also expect that overriding dict.__getitem__() behavior you will override dict.get() behavior too. But it doesn’t work this way:

>>> class GhostDict(dict):
...     def __getitem__(self, key):
...         if key == 'ghost':
...             return 'Boo!'
...         return super().__getitem__(key)
...
>>> d = GhostDict()
>>> d['ghost']
'Boo!'
>>> d.get('ghost')  # returns None
>>>

It happens, because Python built-in dict is implemented on C and its methods are independent of one another. It is done for performance, I guess.

So what you really need is Mapping (read-only) or MutableMapping abstract base classes from collections.abc module. The classes provide full dictionary interface based on a handful of abstract methods you have to override and they work as expected.

>>> from collections.abc import Mapping
>>> class GhostDict(Mapping):
...     def __init__(self, *args, **kw):
...         self._storage = dict(*args, **kw)
...     def __getitem__(self, key):
...         if key == 'ghost':
...             return 'Boo!'
...         return self._storage[key]
...     def __iter__(self):
...         return iter(self._storage)    # ``ghost`` is invisible
...     def __len__(self):
...         return len(self._storage)
...
>>> d = GhostDict(x=1, y=2)
>>> d['ghost']
'Boo!'
>>> d.get('ghost')
'Boo!'
>>> d['x']
1
>>> list(d.keys())
['y', 'x']
>>> list(d.values())
[1, 2]
>>> len(d)
2

Type checking also works as expected:

>>> isinstance(GhostDict(), Mapping)
True
>>> isinstance(dict(), Mapping)
True

P.S. You can see my own implementation of the task in the sources of ConfigTree package. As I said above, it isn’t perfect, it’s just good enough for the context it is used in. And its tests... well, I have no idea what happens there now. I just don’t touch them.