json.h performance (update!)
So previously I tested my json.h library’s performance against other JSON libraries in the wild (json.h performance (vs other C/C++ JSON parsers). My JSON parser wasn’t performing as I’d expect in the tests, so I’ve spent a good amount of time looking at why.
Before we begin, you can find the sources to my public domain, one .c/one .h file json.h library here - https://github.com/sheredom/json.h
A little information on my initial approach. The idea was that we would try and keep the values that make up arrays and objects together in memory. Using the worked example;
{“a” : [123, null, true, false, “alphabet”]}
Would produce the following DOM structure for the above JSON;
In the above diagram, we can see that the one json_object_s contains an array to the names and values contained within the object, and the one json_array_s contains an array to the values.
In practice though, it was this design decision that caused the main performance bottleneck of the approach - effectively each time you came to a new object or array, you’d have to first skip over all the contents of the object or array simply to first find out the number of values within! In big O notation, our worst case was O(n²) for parsing any given JSON file.
To fix this issue, we need to decouple the allocation of the memory to store the values contained within objects and arrays such that we can create a single value, fill out of the information about the contents of that value, and then allocate the next value. Effectively we need to sacrifice our array and instead have a linked-list of values.
To do this I’ve introduced json_array_element_s and json_object_element_s structures which form the new linked-list to the json.h library, and it is through these that you’ll have to iterate when traversing the DOM. Our worst case now becomes O(n) for parsing a JSON file.
Our new DOM for the above worked example is;
Our new DOM is admittedly not as concise or pretty, but performance wise - it is a winner.
I used three files, parsed and traversed them, and recorded the results in my previous post. I’ve updated the graphs below but this time added in a new row ‘json.h - old’ for the previous approach, and json.h is using the newer approach.
As can be seen from the three charts, our performance is massively improved over the previous approach, an easy 3x+ faster overall. gason still performs the best of all the libraries benchmarked, but at least our parser is now in the region of the other parsers in terms of performance, non-withstanding the massive philosophical design difference that json.h decided on.
I hope you enjoyed this post, and I’ll keep beavering away on my json.h library to improve it further!