Our previous attend to compare performances of Python implementations for protocol buffers and thrift serializations has generated interesting feedback and suggestions. The main request we received was to try to broaden the coverage of the previous benchmark so as to cover the full range of available options…
We are not there yet, but the benchmark now allows comparing 5 differents frameworks :
- protocol buffers
- capnp proto (using the pycapnp package)
- json (standard library package)
Comparing apples with oranges
Following our previous post, several persons suggested us to have a look at the capnp proto serialization system which is very similar in principle to thrift and protocol buffers we compared earlier. This similarity allowed us to get them covered by the previous benchmark in no time, and at first capnp proto performance looked astonishingly good.
We refrained to publish such results however, as something was not looking correct. If we were to believe what was reported, capnp deserialization time was not depending upon the size or type of the messages to be processed. Assuming we had misunderstood how to use the pycapnp extension package we were leveraging, we contacted Jason Paryani who supports it to ask if he could suggest anything.
Jason explained us that our results were not surprising him as with capnp real deserialization will take place only when message inner content is accessed. Jason also observed that our previous approach to time serialization was probably favoring capnp irrealistically as part of the serialization happens when the message content is set.
In short, to allow a fair comparison in between the different frameworks we wanted to cover and not be fooled by implementation choices made by library developers, Jason advised to revise our benchmarking approach replacing :
- serialize by construct & serialize
- deserialize by deserialize & traverse
The new approach is probably not a very good one if one want to establish the absolute performance of a single framework. For example, the full traversal requirement will unnecessarily harm the deserialization performance figure for json or msgpack library which deliver fully deserialized dictionaries in one shot.
We believe however that by having each benchmark performs same duties we render meaningfull comparisons possibles. We invit however any interested individual to review current benchmark and let us know what could be done to improve fairness of the comparisons we are trying to make.
You may run the benchmarks on your side and send us the results for publication on the GitHub project. The machines we are relying on are low end one… :)
The two kids of the block are thrift and the new entrant msgpack. There is no point in trying to departage those 2 winners as they are not playing in the same category (schema versus schemaless systems…).