Tag “Architecture”

If you develop applications that use untrusted input, you deal with validation. No matter which framework or library you are using. It is a common task. So I am going to share a recipe that neatly integrates validation layer with business logic one. It is not about what data to validate and how to validate they, it is mostly about how to make code looks better using Python decorators and magic methods.

Let’s say we have a class User with a method login.

class User(object):

    def login(self, username, password):
        ...

And we have a validation schema Credentials. I use Colander, but it does not matter. You can simply replace it by your favorite library:

import colander


class Credentials(colander.MappingSchema):

    username = colander.SchemaNode(
        colander.String(),
        validator=colander.Regex(r'^[a-z0-9\_\-\.]{1,20}$')
    )
    password = colander.SchemaNode(
        colander.String(),
        validator=colander.Length(min=1, max=100),
    )

Each time you call login with untrusted data, you have to validate the data using Credentials schema:

user = User()
schema = Credentials()

trusted_data = schema.deserialize(untrusted_data)
user.login(**trusted_data)

The excessive code is a trade-off for flexibility. Such methods are also can be called using trusted data. So we can’t just put validation into the method itself. However, we can bind the schema to the method without loss of flexibility.

Firstly, create validation package using the following structure:

myproject/
    __init__.py
    ...
    validation/
        __init__.py
        schema.py

Then add the following code into myproject/validation/__init__.py (again, usage of cached_property is inessential detail, you can use the same decorator provided by your favorite framework):

from cached_property import cached_property

from . import schema


def set_schema(schema_class):
    def decorator(method):
        method.__schema_class__ = schema_class
        return method
    return decorator


class Mixin(object):

    class Proxy(object):

        def __init__(self, context):
            self.context = context

        def __getattr__(self, name):
            method = getattr(self.context, name)
            schema = method.__schema_class__()

            def validated_method(params):
                params = schema.deserialize(params)
                return method(**params)

            validated_method.__name__ = 'validated_' + name
            setattr(self, name, validated_method)
            return validated_method

    @cached_property
    def validated(self):
        return self.Proxy(self)

There are three public objects: schema module, set_schema decorator, and Mixin class. The schema module is a container for all validation schemata. Place Credentials class into this module. The set_schema decorator simply adds passed validation schema to decorating method as __schema_class__ attribute. The Mixin class adds proxy object validated. The object provides access to the methods with __schema_class__ attribute and lazily creates their copies wrapped by validation routine. This is how it works:

from myproject import validation


class User(object, validation.Mixin):

    @validation.set_schema(validation.schema.Credentials)
    def login(self, username, password):
        ...

Now, we can call validated login method within a single line of code:

user = User()

user.validated.login(untrusted_data)

So what we get: the code is more compact; it is still flexible, i.e. we can call the method without validation; and it is more readable and self-documenting.

What is coroutine? Complete explanation you can find in David Beazley’s presentation—“A Curious Course on Coroutines and Concurrency.” Here is my rough one. It is a generator which consumes values instead of emits ones.

>>> def gen():  # Regular generator
...     yield 1
...     yield 2
...     yield 3
...
>>> g = gen()
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> def cor():  # Coroutine
...     while True:
...         i = yield
...         print '%s consumed' % i
...
>>> c = cor()
>>> c.next()
>>> c.send(1)
1 consumed
>>> c.send(2)
2 consumed
>>> c.send(3)
3 consumed

As you can see yield statement can be used with assignment to consume values from outer code. An obviously named method send is used to send value to coroutine. Additionally coroutine should be “activated” by calling next method (or __next__ in Python 3.x). Since coroutine activation may be annoying, the following decorator is usually used for this purposes.

>>> def coroutine(f):
...     def wrapper(*args, **kw):
...         c = f(*args, **kw)
...         c.send(None)    # This is the same as calling ``next()``,
...                         # but works in Python 2.x and 3.x
...         return c
...     return wrapper

If you need to shutdown coroutine, use close method. Calling it will raise an exception GeneratorExit inside coroutine. It will raise also, when coroutine is destroyed by garbage collector.

>>> @coroutine
... def worker():
...     try:
...         while True:
...             i = yield
...             print "Working on %s" % i
...     except GeneratorExit:
...         print "Shutdown"
...
>>> w = worker()
>>> w.send(1)
Working on 1
>>> w.send(2)
Working on 2
>>> w.close()
Shutdown
>>> w.send(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> w = worker()
>>> del w  # BTW, it will not be passed in PyPy. You should explicitly call ``gc.collect()``
Shutdown

This exception cannot be “swallowed”, because it will cause of RuntimeError exception. Catching it should be used for freeing resources only.

>>> @coroutine
... def bad_worker():
...     while True:
...         try:
...             i = yield
...             print "Working on %s" % i
...         except GeneratorExit:
...             print "Do not disturb me!"
...
>>> w = bad_worker()
>>> w.send(1)
Working on 1
>>> w.close()
Do not disturb me!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: generator ignored GeneratorExit

That is all what you need to know about coroutines to start using them. Let’s see what benefits they give. In my opinion, a single coroutine is useless. The true power of coroutines comes when they are used in pipelines. A simple abstract example: filter out even numbers from input source, then multiply each number on 2, then add 1.

>>> @coroutine
... def apply(op, next=None):
...     while True:
...         i = yield
...         i = op(i)
...         if next:
...             next.send(i)
...
>>> @coroutine
... def filter(cond, next=None):
...     while True:
...         i = yield
...         if cond(i) and next:
...             next.send(i)
...
>>> result = []
>>> pipeline = filter(lambda x: not x % 2, \
...            apply(lambda x: x * 2, \
...            apply(lambda x: x + 1, \
...            apply(result.append))))
>>> for i in range(10):
...     pipeline.send(i)
...
>>> result
[1, 5, 9, 13, 17]

Schema of pipeline

Schema of pipeline

But the same pipeline can be implemented using generators:

>>> def apply(op, source):
...     for i in source:
...         yield op(i)
...
>>> def filter(cond, source):
...     for i in source:
...         if cond(i):
...             yield i
...
>>> result = [i for i in \
...     apply(lambda x: x + 1, \
...     apply(lambda x: x * 2, \
...     filter(lambda x: not x % 2, range(10))))]
>>> result
[1, 5, 9, 13, 17]

So what the difference between coroutines and generators? The difference is that generators can be connected in straight pipeline only, i.e. single input—single output. Whereas coroutines may have multiple outputs. Thus they can be connected in really complicated forked pipelines. For example, filter coroutine could be implemented in this way:

>>> @coroutine
... def filter(cond, ontrue=None, onfalse=None):
...     while True:
...         i = yield
...         next = ontrue if cond(i) else onfalse
...         if next:
...             next.send(i)
...

But let’s see an another example. Here is the mock of distributed computing system with cache, load balancer, and three workers.

def coroutine(f):
    def wrapper(*arg, **kw):
        c = f(*arg, **kw)
        c.send(None)
        return c
    return wrapper


@coroutine
def logger(prefix="", next=None):
    while True:
        message = yield
        print("{0}: {1}".format(prefix, message))
        if next:
            next.send(message)


@coroutine
def cache_checker(cache, onsuccess=None, onfail=None):
    while True:
        request = yield
        if request in cache and onsuccess:
            onsuccess.send(cache[request])
        elif onfail:
            onfail.send(request)


@coroutine
def load_balancer(*workers):
    while True:
        for worker in workers:
            request = yield
            worker.send(request)


@coroutine
def worker(cache, response, next=None):
    while True:
        request = yield
        cache[request] = response
        if next:
            next.send(response)


cache = {}
response_logger = logger("Response")
cluster = load_balancer(
    logger("Worker 1", worker(cache, 1, response_logger)),
    logger("Worker 2", worker(cache, 2, response_logger)),
    logger("Worker 3", worker(cache, 3, response_logger)),
)
cluster = cache_checker(cache, response_logger, cluster)
cluster = logger("Request", cluster)


if __name__ == "__main__":
    from random import randint


    for i in range(20):
        cluster.send(randint(1, 5))

Schema of the mock

Distributed computing system mock

To start love coroutines try to implement the same system without them. Of course, you can implement some classes to store state in the attributes and do work using send method:

class worker(object):

    def __init__(self, cache, response, next=None):
        self.cache = cache
        self.response = response
        self.next = next

    def send(self, request):
        self.cache[request] = self.response
        if self.next:
            self.next.send(self.response)

But I dare you to find a beautiful implementation for load balancer in this way!

I hope I persuaded you that coroutines are cool. So if you are going to try them, take a look at my library—CoPipes. It will be helpful to build really big and complicated data processing pipelines. Your feedback is desired.

When I hear “It should be done in this way”, I always ask a question “Why?” Indeed, if everyone believed in design patterns but did not understand it, engineering would be yet another religion.

So what about “Fat model, skinny controller” (and vice versa) in the MVC pattern? Where do controller’s charges end and model’s ones begin?

Well, the main goals of every design pattern are encapsulation, isolation and code reusability. The MVC pattern consists of three parts. Model one responds for business rules and data management. View one­—for model representation in the user interface. And controller one binds model and view, i.e. makes them work together. “Thank you Captain Obvious” you would say, but keep in mind the key point is that model knows nothing about controller and view.

Imagine a typical web application. You carefully implemented HTTP request handling, HTML generation, user session, and other web-related things. If you had been asked by client: “Look, it works great. But we need to interact with the application using RPC protocol. Could you implement it?” Would you have to change the code of your models? Would there be some unusable code? I mean some code, which is used in web only. If you answered “yes”, you should change your design approach.

Model should be reusable in any interface without modification: web, RPC, shell, desktop, mobile—you name it. Therefore any interface-specific code should be placed in controller (or view). By the way, TDD helps to keep model clean. If you test models in isolation, there will be no unnecessary code. Because test environment is yet another interface!

However, there is a nuance—data validation. On one hand, it is clearly model’s job. On the other, it would be a cause of great performance overhead in batch processing from a trusted source. I think, validation layer should be separate from the model one or be switchable. And of course, it should not be mixed with sanitization layer. For example, protecting from XSS attack is web controller’s job. Protecting form SQL injection is model’s one. Validation layer (when it is needed) should be placed somewhere between these two. Solution depends on the task, there is no silver bullet.

In a nutshell, keep models clean and reusable. Would they be fat or skinny? Nobody cares. Make them play sports and they will be in good shape!