2013

My enthusiasm in learning D is contagious. Some of my colleagues ask me from time to time about useful resources. Here is a list of ones.

  • APT repository for D—if you are an Ubuntu fan (like me), you don’t need an explanation what is it. Here you can get latest stable DMD compiler and some useful libraries and tools.
  • DUB—it is a build tool with support of managing dependencies. Its features are similar to Maven for Java and Pip for Python.
  • Derelict—awesome collection of binding to popular C libraries. Useful in game development.
  • Vibe.d—the web framework, nuff said. Frankly, I haven’t spend a lot of time fiddling with it, but it looks promising.
  • Phobos—the standard library. It is not so diversified as Python one, but it is powerful enough. By the way, if you dream of reinventing the weel, it is your chance! There is still a lot of work.
  • The D Programming Language by Andrei Alexandrescu—the book you must read.

That is all for now. Wish you happy hacking!

What is coroutine? Complete explanation you can find in David Beazley’s presentation—“A Curious Course on Coroutines and Concurrency.” Here is my rough one. It is a generator which consumes values instead of emits ones.

>>> def gen():  # Regular generator
...     yield 1
...     yield 2
...     yield 3
...
>>> g = gen()
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> def cor():  # Coroutine
...     while True:
...         i = yield
...         print '%s consumed' % i
...
>>> c = cor()
>>> c.next()
>>> c.send(1)
1 consumed
>>> c.send(2)
2 consumed
>>> c.send(3)
3 consumed

As you can see yield statement can be used with assignment to consume values from outer code. An obviously named method send is used to send value to coroutine. Additionally coroutine should be “activated” by calling next method (or __next__ in Python 3.x). Since coroutine activation may be annoying, the following decorator is usually used for this purposes.

>>> def coroutine(f):
...     def wrapper(*args, **kw):
...         c = f(*args, **kw)
...         c.send(None)    # This is the same as calling ``next()``,
...                         # but works in Python 2.x and 3.x
...         return c
...     return wrapper

If you need to shutdown coroutine, use close method. Calling it will raise an exception GeneratorExit inside coroutine. It will raise also, when coroutine is destroyed by garbage collector.

>>> @coroutine
... def worker():
...     try:
...         while True:
...             i = yield
...             print "Working on %s" % i
...     except GeneratorExit:
...         print "Shutdown"
...
>>> w = worker()
>>> w.send(1)
Working on 1
>>> w.send(2)
Working on 2
>>> w.close()
Shutdown
>>> w.send(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> w = worker()
>>> del w  # BTW, it will not be passed in PyPy. You should explicitly call ``gc.collect()``
Shutdown

This exception cannot be “swallowed”, because it will cause of RuntimeError exception. Catching it should be used for freeing resources only.

>>> @coroutine
... def bad_worker():
...     while True:
...         try:
...             i = yield
...             print "Working on %s" % i
...         except GeneratorExit:
...             print "Do not disturb me!"
...
>>> w = bad_worker()
>>> w.send(1)
Working on 1
>>> w.close()
Do not disturb me!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: generator ignored GeneratorExit

That is all what you need to know about coroutines to start using them. Let’s see what benefits they give. In my opinion, a single coroutine is useless. The true power of coroutines comes when they are used in pipelines. A simple abstract example: filter out even numbers from input source, then multiply each number on 2, then add 1.

>>> @coroutine
... def apply(op, next=None):
...     while True:
...         i = yield
...         i = op(i)
...         if next:
...             next.send(i)
...
>>> @coroutine
... def filter(cond, next=None):
...     while True:
...         i = yield
...         if cond(i) and next:
...             next.send(i)
...
>>> result = []
>>> pipeline = filter(lambda x: not x % 2, \
...            apply(lambda x: x * 2, \
...            apply(lambda x: x + 1, \
...            apply(result.append))))
>>> for i in range(10):
...     pipeline.send(i)
...
>>> result
[1, 5, 9, 13, 17]

Schema of pipeline

Schema of pipeline

But the same pipeline can be implemented using generators:

>>> def apply(op, source):
...     for i in source:
...         yield op(i)
...
>>> def filter(cond, source):
...     for i in source:
...         if cond(i):
...             yield i
...
>>> result = [i for i in \
...     apply(lambda x: x + 1, \
...     apply(lambda x: x * 2, \
...     filter(lambda x: not x % 2, range(10))))]
>>> result
[1, 5, 9, 13, 17]

So what the difference between coroutines and generators? The difference is that generators can be connected in straight pipeline only, i.e. single input—single output. Whereas coroutines may have multiple outputs. Thus they can be connected in really complicated forked pipelines. For example, filter coroutine could be implemented in this way:

>>> @coroutine
... def filter(cond, ontrue=None, onfalse=None):
...     while True:
...         i = yield
...         next = ontrue if cond(i) else onfalse
...         if next:
...             next.send(i)
...

But let’s see an another example. Here is the mock of distributed computing system with cache, load balancer, and three workers.

def coroutine(f):
    def wrapper(*arg, **kw):
        c = f(*arg, **kw)
        c.send(None)
        return c
    return wrapper


@coroutine
def logger(prefix="", next=None):
    while True:
        message = yield
        print("{0}: {1}".format(prefix, message))
        if next:
            next.send(message)


@coroutine
def cache_checker(cache, onsuccess=None, onfail=None):
    while True:
        request = yield
        if request in cache and onsuccess:
            onsuccess.send(cache[request])
        elif onfail:
            onfail.send(request)


@coroutine
def load_balancer(*workers):
    while True:
        for worker in workers:
            request = yield
            worker.send(request)


@coroutine
def worker(cache, response, next=None):
    while True:
        request = yield
        cache[request] = response
        if next:
            next.send(response)


cache = {}
response_logger = logger("Response")
cluster = load_balancer(
    logger("Worker 1", worker(cache, 1, response_logger)),
    logger("Worker 2", worker(cache, 2, response_logger)),
    logger("Worker 3", worker(cache, 3, response_logger)),
)
cluster = cache_checker(cache, response_logger, cluster)
cluster = logger("Request", cluster)


if __name__ == "__main__":
    from random import randint


    for i in range(20):
        cluster.send(randint(1, 5))

Schema of the mock

Distributed computing system mock

To start love coroutines try to implement the same system without them. Of course, you can implement some classes to store state in the attributes and do work using send method:

class worker(object):

    def __init__(self, cache, response, next=None):
        self.cache = cache
        self.response = response
        self.next = next

    def send(self, request):
        self.cache[request] = self.response
        if self.next:
            self.next.send(self.response)

But I dare you to find a beautiful implementation for load balancer in this way!

I hope I persuaded you that coroutines are cool. So if you are going to try them, take a look at my library—CoPipes. It will be helpful to build really big and complicated data processing pipelines. Your feedback is desired.

When I hear “It should be done in this way”, I always ask a question “Why?” Indeed, if everyone believed in design patterns but did not understand it, engineering would be yet another religion.

So what about “Fat model, skinny controller” (and vice versa) in the MVC pattern? Where do controller’s charges end and model’s ones begin?

Well, the main goals of every design pattern are encapsulation, isolation and code reusability. The MVC pattern consists of three parts. Model one responds for business rules and data management. View one­—for model representation in the user interface. And controller one binds model and view, i.e. makes them work together. “Thank you Captain Obvious” you would say, but keep in mind the key point is that model knows nothing about controller and view.

Imagine a typical web application. You carefully implemented HTTP request handling, HTML generation, user session, and other web-related things. If you had been asked by client: “Look, it works great. But we need to interact with the application using RPC protocol. Could you implement it?” Would you have to change the code of your models? Would there be some unusable code? I mean some code, which is used in web only. If you answered “yes”, you should change your design approach.

Model should be reusable in any interface without modification: web, RPC, shell, desktop, mobile—you name it. Therefore any interface-specific code should be placed in controller (or view). By the way, TDD helps to keep model clean. If you test models in isolation, there will be no unnecessary code. Because test environment is yet another interface!

However, there is a nuance—data validation. On one hand, it is clearly model’s job. On the other, it would be a cause of great performance overhead in batch processing from a trusted source. I think, validation layer should be separate from the model one or be switchable. And of course, it should not be mixed with sanitization layer. For example, protecting from XSS attack is web controller’s job. Protecting form SQL injection is model’s one. Validation layer (when it is needed) should be placed somewhere between these two. Solution depends on the task, there is no silver bullet.

In a nutshell, keep models clean and reusable. Would they be fat or skinny? Nobody cares. Make them play sports and they will be in good shape!

I was fiddling with D during my vacation, when I had a rest from my house building. So I am going to write about the most exciting (from my point of view) feature of this programming language—the uniform function call syntax.

As it goes, an example is the best explanation.

import std.stdio;

int twice(int i) {
    return i * 2;
}

void main() {
    assert(twice(10) == 20);
}

There is nothing interesting. Code is executed as expected. But let’s make some changes.

import std.stdio;

int twice(int i) {
    return i * 2;
}

void main() {
    assert(10.twice == 20);  // Does it work?
}

Oh heck, that works! Yes, that is exactly what you think. You can call any non-member function like a member one of some type (generic or user-defined, never mind), if this function accepts argument of this type as a first parameter. In other words, you can write obj.func(arg1, arg2) instead of func(obj, arg1, arg2). In addition, you can omit parens, if there are no other arguments.

The first benefit of the feature is chaining:

import std.array;
import std.algorithm;

void main() {
    auto arr = [1, 2, 3, 4, 5]
        .filter!("a % 2")
        .map!("a * 2")
        .array();
    assert(arr == [2, 6, 10]);
}

dQuery is waiting for its heroes :)

The second one is some sort of monkey patching. However, you cannot totally change third-party class behavior. Because your non-member functions have no access to the private and protected members. But you can extend it.

And the last but not least, it significantly improves code readability.

import std.file;
import std.json;

void configure(string configPath) {
    // auto config = parseJSON(readText(configPath));  Never again!
    auto config = configPath.readText().parseJSON();
    // Do something useful...
}

P.S. Even though this feature had been mentioned in Andrei Alexandrescu’s book, “The D Programming Language” it was working for arrays only for a long time. But now it works for any type. I have checked it in DMD v2.062 compiler.

After I had published my previous article, I got some feedback from my colleagues. And there was a simple (at first glance) but interesting question, that I am going to discuss. Why do I use __init__ method in my metaclass? Will __new__ one be more pythonic?

Indeed, all articles I have ever read describe metaclasses using __new__ method in their examples. Frankly, I used it too in the previous version of GreenRocket library. It was cargo cult. And I postponed publishing, before I had fixed that.

Nevertheless, the main goal of the previous article was to show, that we can use classes as regular objects. And it seems to be achieved. But metaclasses mechanism is not limited by this use case only. Python documentation says about it: “The potential uses for metaclasses are boundless. Some ideas that have been explored include logging, interface checking, automatic delegation, automatic property creation, proxies, frameworks, and automatic resource locking/synchronization.” So you really need the power of __new__ method sometimes:

>>> class Meta(type):
...     def __new__(meta, name, bases, attrs):
...         filtered_bases = []
...         for base in bases:
...             if isinstance(base, type):
...                 filtered_bases.append(base)
...             else:
...                 print(base)
...         return type.__new__(meta, name, tuple(filtered_bases), attrs)
...
>>> class Test(object, 'WTF!?', 'There are strings in bases!'):
...     __metaclass__ = Meta
...
WTF!?
There are strings in bases!
>>> Test.__mro__
(<class '__main__.Test'>, <type 'object'>)

However, I am pretty sure, that you have to avoid __new__ as much as you can. Because it significantly decreases flexibility. For example, what happens if you inherit a new class from another two with two different metaclasses?

>>> class AMeta(type): pass
...
>>> class BMeta(type): pass
...
>>> class A(object): __metaclass__ = AMeta
...
>>> class B(object): __metaclass__ = BMeta
...
>>> class C(A, B): pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

As you can see, you get a conflict. You have to create a new metaclass based on both existing ones:

>>> class CMeta(AMeta, BMeta): pass
...
>>> class C(A, B): __metaclass__ = CMeta
...

If these two metaclasses define just __init__ method, it will be simple:

>>> class CMeta(AMeta, Bmeta):
...     def __init__(cls, name, bases, attrs):
...         Ameta.__init__(cls, name, bases, attrs)
...         Bmeta.__init__(cls, name, bases, attrs)

But if both of them define __new__ one, a walk in the park will turn to run through the hell. And this is not a hypothetical example. Try to mix in collections.Mapping to a model declaration class based on your favorite ORM. I got such task on my previous project.

In conclusion. Use __new__ method only if you are going to do something, which is unfeasible in __init__ one. And think twice, before copying code from examples. Even if the examples are from official documentation.

Every article about Python metaclasses contains a quotation (yep, this one is not exception) by Tim Peters: “Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).” I completely disagree with this saying. Why? Because I hate magic. Moreover, I hate when something is explained using magic. Metaclasses are regular tools, and they are very useful in some cases. What cases? Let’s see.

As you know, classes in Python are full-featured objects. As any object, they are constructed using classes. Thus, the class which is used for constructing another class is called metaclass. By default, type is used in this role.

>>> class SomeClass(object):
...     pass
...
>>> SomeClass.__class__
<type 'type'>

When you need to get a custom metaclass, you should inherit it from type. Just like a regular class inherits object:

>>> class SomeMetaClass(type):
...     pass
...
>>> class AnotherClass(object):                            # Python 2.x syntax
...     __metaclass__ = SomeMetaClass
...
>>> class AnotherClass(object, metaclass=SomeMetaClass):   # Python 3.x syntax
...     pass
...
>>> AnotherClass.__class__
<class '__main__.SomeMetaClass'>

The syntax shown above usually confuses newbies. Because the magic is still there. Okay, forget about metaclasses. Let’s think about objects:

>>> obj = SomeClass()

What happens in this single line of code? We just create a new object of class SomeClass and assign the reference of this object to a variable obj. Clear. Let’s go on.

>>> AnotherClass = SomeMetaClass('AnotherClass', (object,), {})

And what is there? Exactly the same thing, but we create a class instead of a regular object. This is what happens in the magic syntax. The interpreter parses syntactic sugar of class declaration and executes it as shown above. The first parameter passed into metaclass call is a class name (it will be available under AnotherClass.__name__ attribute). The second one is a tuple of parent (or base) classes. And the third one is a body of class—its attributes and methods (it will accessible via AnotherClass.__dict__).

If you work with JavaScript, it should be familiar for you. There are no classes in JavaScript. Therefore, when you emulate them, you will have to call a factory function. The function returns an object, which will be used later as a class. Python metaclass works in the same but more convenient way.

The last question is why do we need this feature? Is simple inheritance not enough? Well, an example is the best explanation. Let’s take a look on GreenRocket library (hmm... implicit advertisement). Don’t worry, it is not about rocket science. It is a simple implementation of Observer design pattern. There are about 150 lines of code 70 of which are doc-strings.

You create a class of signals:

>>> from greenrocket import Signal
>>> class MySignal(Signal):
...     pass
...

Subscribe a handler on it:

>>> @MySignal.subscribe
... def handler(signal):
...     print('handler: ' + repr(signal))
...

Then create and fire a signal:

>>> MySignal().fire()
handler: MySignal()

...and the handler is called. Here is the body of subscribe method:

@classmethod
def subscribe(cls, handler):
    """ Subscribe handler to signal.  May be used as decorator """
    cls.logger.debug('Subscribe %r on %r', handler, cls)
    cls.__handlers__.add(handler)
    return handler

Look at cls.__handlers__ attribute. The library logic is based on the fact, that each signal class must have this attribute. If there had been no metaclasses in Python, the library would require explicit declaration of one in the following way:

>>> class MySignal(Signal):
...     __handlers__ = WeakSet()
...

But it is stupid copy-paste work. In addition, this is a bug prone solution:

>>> class MySecondSignal(MySignal):
...     pass
...

If user misses __handler__ attribute, MySecondSignal will actually use handlers of MySignal. Good luck in debug! That is why we need a metaclass there, it just does this work for us:

class SignalMeta(type):
    """ Signal Meta Class """

    def __init__(cls, class_name, bases, attrs):
        cls.__handlers__ = WeakSet()

As you can see, there is no magic. Of course, there are still some corner cases, which are not explained in the article. But I hope, it will be useful as a quick start for understanding of Python metaclasses.

In my opinion, there are three levels of learning a language. First of all, you learn basic grammar and vocabulary. Then you learn specific things such as idioms and advanced constructions. And finally, you learn obscene language. It is very personal, where and when to use the latter. But we cannot deny the fact that swearing makes speech more expressive.

The swearwords in programming languages are called “dirty hacks”. Usually, it is strongly recommended to avoid them. However, hacking sometimes makes program better. Let’s take a look at some obscene Python.

>>> class A: pass
...
>>> class B: pass
...
>>> a = A()
>>> isinstance(a, A)
True
>>> a.__class__ = B
>>> isinstance(a, A)
False
>>> isinstance(a, B)
True

Well, you would think: “If someone from my team used this feature, I would commit a murder.” Frankly, it is not a feature. It is hard to believe, that Guido van Rossum and other Python developers were thinking about it: “We definitely need an ability to change object’s class in runtime.” It is rather a side effect of Python design. Anyway, I’m going to change your mind about this hack.

Imagine a CSM, where each page is described by regular Python dictionary object (it is stored in MongoDB, for example). So, you need a way to map these objects to some more useful ones. Obviously, each page has at least title and body:

class Page(object):
    """ A base class for representing pages """

    def __init__(self, data):
        self.title = data['title']
        self.body = data['body']

Also, page may have a number of additional widgets, which can be represented by mixins:

class Commentable(object):
    """ Adds comments on Page """

    def get_comments(self, page_num=1):
        """ Get list of comments for specified page number """

    def add_comment(self, user, comment):
        """ User comments Page """

    def remove_comment(self, comment_id):
        """ Moderator or comment author removes comment from Page """


class Likeable(object):
    """ Adds "like/dislike" buttons on Page """

    def like(self, user):
        """ User likes Page """

    def dislike(self, user):
        """ User dislikes Page """


class Favoritable(object):
    """ Adds "favorite" button on Page """

    def add_to_favorites(self, user):
        """ User adds Page to favorites """

    def remove_from_favorites(self, user):
        """ User removes Page from favorites """

The problem is how to get them together. A classical solution from “Design Patterns” by Gang of Four is a factory. It may be an additional class or function which takes a page descriptor dictionary, extracts mixin set, builds class based on Page and specified mixins, and returns an object of this class. But why do we need this additional entity? Let’s do it inside Page class directly:

class Page(object):
    """ A base class for representing pages """

    mixins = {}     # a map of registered mixins
    classes = {}    # a map of classes for each mixin combination

    @classmethod
    def mixin(cls, class_):
        """ Decorator registers mixin class """
        cls.mixins[class_.__name__] = class_
        return class_

    @classmethod
    def get_class(cls, mixin_set):
        """ Returns class for given mixin combination """
        mixin_set = tuple(mixin_set)    # Turn list into hashable type
        if mixin_set not in cls.classes:
            # Build new class, if it doesn't exist
            bases = [cls.mixins[class_name] for class_name in mixin_set]
            bases.append(Page)
            name = ''.join(class_.__name__ for class_ in bases)
            cls.classes[mixin_set] = type(name, tuple(bases), {})
        return cls.classes[mixin_set]

    def __init__(self, data):
        self.title = data['title']
        self.body = data['body']
        self.__class__ = self.get_class(data['mixins'])    # Fu^WHack you!!!

...register our mixins:

@Page.mixin
class Commentable(object):
    """ Adds comments on Page """


@Page.mixin
class Likeable(object):
    """ Adds "like/dislike" buttons on Page """


@Page.mixin
class Favoritable(object):
    """ Adds "favorite" button on Page """

...and test it:

somepage = Page({
    'title': 'Lorem Ipsum',
    'body': 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
    'mixins': ['Commentable', 'Likeable'],
})

assert somepage.__class__.__name__ == 'CommentableLikeablePage'
assert isinstance(somepage, Commentable)
assert isinstance(somepage, Likeable)
assert isinstance(somepage, Page)

See full source of example.

So what did we get? We got a beautiful solution based on a questionable feature. This is exactly the same situation when using bad words makes speech better. Do you agree? No? Hack you!

P.S. If you combine this with Pyramid traversal, you will get a super flexible and powerful CMS. But this is another story.

This article was inspirited by an interesting conversation that happened between me and my colleague during the lunch break. I’m not going to talk about what’s wrong with PHP here. Too many spears have been broken on this battlefield. If you want to know my opinion, you will have to read the article “PHP: a fractal of bad design.” I agree with every single word. What I’m going to do now is answer the following question: “Why does every former PHP developer blame it?”

You know, every time there is a discussion about PHP a guy appears and says: “OK. You are cool. You have learned something else. You don’t use PHP anymore. So what’s your problem? Why don’t you shut up and get off my lawn?” Actually, I don’t care about PHP developers who are lazy to learn anything else. It is their business. IT industry has improved so fast that every lazy bum will go down very soon. Again, if you don’t want to learn anything new, you will go down and find yourself unemployed. My real problem is the customers who want me to use PHP because they believe that it will be easier and cheaper to find developers for support in future. And that is total bullshit which I want to discuss.

Why are PHP developers cheap? Why are there so many of those ready to be hired? The answer is in the article “Finding Great Developers” by Joel Spolsky (Everyone loves and reads Joel, don’t they?): because there is a lot of unqualified amateurs in the market. Most of good developers work somewhere. And they learn something new. Every day. And if they know, for example Ruby or Python, you will never make them use PHP. As a result, the market is overflowed by laymen. And most of them are PHP guys. So are you ready to hire them for the support of you brilliant startup? Think about it. Each of them has drowned a couple of projects already. Are you? No? Then stop using PHP and let it die.

Because it must. Because every time you start a new project using PHP, I will kill a little puppy. You have been warned.