Binary payloads

While it's usually not relevant, if your microservice deals with a lot of data, using an alternative format can be an attractive option to increase performances and decrease the required network bandwidth without having to rely on GZIP.

The two widely used binary formats out there are Protocol Buffers (protobuf) and MessagePack.

Protocol buffers (https://developers.google.com/protocol-buffers) requires you to describe the data that's being exchanged into some schema that will be used to index the binary content.

It adds quite some work because all data that's transferred will need to be described in a schema and you will need to learn a new Domain Specific Language (DSL).

The following example is taken from the protobuf documentation:

    package tutorial; 

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}

message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phones = 4;
}

message AddressBook {
repeated Person people = 1;
}

Needless to say, it's not very Pythonic and looks more like a database schema. We could argue that describing the data that gets transferred is good practice, but it could become a bit redundant with the Swagger definition if the microservice uses that.

MessagePack (http://msgpack.org/), on the other hand, is schemaless and can compress and uncompress your data by just calling a function.

It's a simple alternative to JSON, and has implementations in most languages. The msgpack ;python library (installed using the pip install msgpack-python command) offers the same level of integration as JSON:

>>> import msgpack 
>>> data = {"this": "is", "some": "data", 1: 2}
>>> msgpack.dumps(data)
b'x83x01x02xa4thisxa2isxa4somexa4data'
>>> msgpack.loads(msgpack.dumps(data))
{1: 2, b'this': b'is', b'some': b'data'}

Notice that the strings are converted into binaries when the data is serialized then deserialized back with the default serializer. This is something to take into account if you need to keep the original types.

Clearly, using MessagePack is quite simple compared to Protobuf--but which one is the faster and provides the best compression ratio depends a lot on your data. In some rare cases, plain JSON might be even quicker to serialize than a binary format.

In terms of compression, you can expect a 10% to 20% compression with MessagePack, but if your JSON contains a lot of strings--which is often the case in microservices--GZIP will do a much better job.

In the following example, a huge JSON payload of 87k ;containing a lot of strings, is converted using MessagePack and then gzipped in both cases:

>>> import json, msgpack 
>>> with open('data.json') as f:
... data = f.read()
...
>>> python_data = json.loads(data)
>>> len(json.dumps(python_data))
88983
>>> len(msgpack.dumps(python_data))
60874
>>> len(gzip.compress(bytes(json.dumps(data), 'utf8')))
5925
>>> len(gzip.compress(msgpack.dumps(data)))
5892

Using MessagePack reduces the size of the payload by quite a lot, but GZIP is crushing it by making it 15 times smaller with both JSON and MessagePack payloads!

It's clear that whatever format you are using, the best way to reduce the payload sizes is to use GZIP--and if your web server does not deal with decompression, it's straightforward in Python thanks to gzip.uncompress().

Now, between using MessagePack and JSON, the binary format is usually faster and is more Python friendly. For instance, if you pass a Python dictionary with integer keys, JSON will convert them into strings while MessagePack will do the right thing:

>>> import msgpack, json 
>>> json.loads(json.dumps({1: 2}))
{'1': 2}
>>> msgpack.loads(msgpack.dumps({1: 2}))
{1: 2}

But there's also the problem of date representations: DateTime objects are not directly serializable in JSON and MessagePack, so you need to make sure you convert them.

In any case, in a world of microservices where JSON is the most accepted standard, sticking with string keys and taking care of dates are minor annoyances to stick with a universally adopted standard.

Unless all your services are in Python with well-defined structures, and you need to speed up the serialization steps as much as possible, it's probably simpler to stick with JSON.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset