Extracting the json.h README.md Code Samples For Compiling
One thing that has always worried me with writing C samples in documentation for my single-header libraries is that you can’t be 100% sure that they will compile successfully. You can always extract them and run them, but you might change them and forget to re-test. Having this be automatic is so powerful. Rust has this feature built in to the ecosystem so that all code samples are testing automagically.
So I wondered - is there anyway to do something similar for my single-header libraries? I decided to try with json.h.
The Code Samples⌗
I want to have more and more code samples to show how easy the library is to use, but the big one there currently is:
const char json[] = "{\"a\" : true, \"b\" : [false, null, \"foo\"]}";
struct json_value_s* root = json_parse(json, strlen(json));
assert(root->type == json_type_object);
struct json_object_s* object = (struct json_object_s*)root->payload;
assert(object->length == 2);
struct json_object_element_s* a = object->start;
struct json_string_s* a_name = a->name;
assert(0 == strcmp(a_name->string, "a"));
assert(a_name->string_size == strlen("a"));
struct json_value_s* a_value = a->value;
assert(a_value->type == json_type_true);
assert(a_value->payload == NULL);
struct json_object_element_s* b = a->next;
assert(b->next == NULL);
struct json_string_s* b_name = b->name;
assert(0 == strcmp(b_name->string, "b"));
assert(b_name->string_size == strlen("b"));
struct json_value_s* b_value = b->value;
assert(b_value->type == json_type_array);
struct json_array_s* array = (struct json_array_s*)b_value->payload;
assert(array->length == 3);
struct json_array_element_s* b_1st = array->start;
struct json_value_s* b_1st_value = b_1st->value;
assert(b_1st_value->type == json_type_false);
assert(b_1st_value->payload == NULL);
struct json_array_element_s* b_2nd = b_1st->next;
struct json_value_s* b_2nd_value = b_2nd->value;
assert(b_2nd_value->type == json_type_null);
assert(b_2nd_value->payload == NULL);
struct json_array_element_s* b_3rd = b_2nd->next;
assert(b_3rd->next == NULL);
struct json_value_s* b_3rd_value = b_3rd->value;
assert(b_3rd_value->type == json_type_string);
struct json_string_s* string = (struct json_string_s*)b_3rd_value->payload;
assert(0 == strcmp(string->string, "foo"));
assert(string->string_size == strlen("foo"));
/* Don't forget to free the one allocation! */
free(root);
I want to be able to parse the README.md, extract the code samples, turn them into a test for use with my utest.h library will run. I already use CMake for building just the unit-tests, and so given I already use it and I’m pretty familiar with it (despite its glaring flaws), I wondered if I could use it to do the extraction.
CMake of Horrors⌗
So CMake has built-in regex string support so I thought could I use that to do the extraction? The one big issue is that CMake only supports greedy matching of regex - meaning that I have to be super careful when searching for start/end tokens with which to match.
First of all we need to read the whole file into a CMake variable:
file(READ ${CMAKE_CURRENT_SOURCE_DIR}/../README.md readme_md)
CMake has this wonderfully messed up method for differentiating between strings and lists - where a list in CMake terminology is just a string that has semi-colons within it. The problem is that code samples in languages like C use semi-colons as end of statement terminators - which will cause us issues. The best way I’ve found around this is to change the semi-colons to some symbol that wouldn’t appear in the original source. I used the ‘@’ symbol for this since there isn’t an operator in C for it:
string(REPLACE ";" "@" readme_md "${readme_md}")
Ok now we have the string as a real string (non-list) we can extract the code samples themselves. You’ll notice that in the README.md all code samples begin with “```c” and end with “```”. So we can use this to look for our code.
As I said earlier CMake is greedy when it comes to regex, which means if we
used the more natural “.*” we’d match from the very first code sample to the
very last in the file. Not ideal. Instead we need to use the more constrained
search of “```c[^`]*```” - search for the start pattern, and then all
symbols except a “`” until we get to our end. This stores each match as a list
entry into the variable snippets
- meaning we have introdued some semi-colons
into the string too:
string(REGEX MATCHALL "```c[^`]*```" snippets "${readme_md}")
Now to help us be able to test the examples we want to be able to compile each
of the code snippets in isolation from each other. I first attempted to create
a UTEST(foo, bar)
wrapper around each snippet, but I could not figure out how
to create these wrappers such that they would be unique. What I mean is that the
first snippet would be UTEST(generated, snippet0)
the next
UTEST(generated, snippet1)
, etc. For the life of me I couldn’t work out how
this was possible. So instead I just wrapped each snippet into its own braced
region which guaranteed their isolation.
string(REPLACE "```c" "{" snippets ${snippets})
string(REPLACE "```" "}\n\n" snippets ${snippets})
Now all we need to do is remove the semi-colons that were added for the lists, and then turn all “@” symbols we introduced before back into semi-colons:
string(REPLACE ";" "" snippets "${snippets}")
string(REPLACE "@" ";" snippets "${snippets}")
And then we just need to write out the file into some location for inclusion:
file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/generated.h "${snippets}")
Sample generated.h⌗
For the current master json.h, the generated.h
file is:
{
struct json_value_s *json_parse(
const void *src,
size_t src_size);
}
{
struct json_value_s {
void *payload;
size_t type;
};
}
{
struct json_value_s *json_parse_ex(
const void *src,
size_t src_size,
size_t flags_bitset,
void*(*alloc_func_ptr)(void *, size_t),
void *user_data,
struct json_parse_result_s *result);
}
{
enum json_parse_flags_e {
json_parse_flags_default = 0,
json_parse_flags_allow_trailing_comma = 0x1,
json_parse_flags_allow_unquoted_keys = 0x2,
json_parse_flags_allow_global_object = 0x4,
json_parse_flags_allow_equals_in_object = 0x8,
json_parse_flags_allow_no_commas = 0x10,
json_parse_flags_allow_c_style_comments = 0x20,
json_parse_flags_deprecated = 0x40,
json_parse_flags_allow_location_information = 0x80,
json_parse_flags_allow_single_quoted_strings = 0x100,
json_parse_flags_allow_hexadecimal_numbers = 0x200,
json_parse_flags_allow_leading_plus_sign = 0x400,
json_parse_flags_allow_leading_or_trailing_decimal_point = 0x800,
json_parse_flags_allow_inf_and_nan = 0x1000,
json_parse_flags_allow_multi_line_strings = 0x2000,
json_parse_flags_allow_simplified_json =
(json_parse_flags_allow_trailing_comma |
json_parse_flags_allow_unquoted_keys |
json_parse_flags_allow_global_object |
json_parse_flags_allow_equals_in_object |
json_parse_flags_allow_no_commas),
json_parse_flags_allow_json5 =
(json_parse_flags_allow_trailing_comma |
json_parse_flags_allow_unquoted_keys |
json_parse_flags_allow_c_style_comments |
json_parse_flags_allow_single_quoted_strings |
json_parse_flags_allow_hexadecimal_numbers |
json_parse_flags_allow_leading_plus_sign |
json_parse_flags_allow_leading_or_trailing_decimal_point |
json_parse_flags_allow_inf_and_nan |
json_parse_flags_allow_multi_line_strings)
};
}
{
const char json[] = "{\"a\" : true, \"b\" : [false, null, \"foo\"]}";
struct json_value_s* root = json_parse(json, strlen(json));
assert(root->type == json_type_object);
struct json_object_s* object = (struct json_object_s*)root->payload;
assert(object->length == 2);
struct json_object_element_s* a = object->start;
struct json_string_s* a_name = a->name;
assert(0 == strcmp(a_name->string, "a"));
assert(a_name->string_size == strlen("a"));
struct json_value_s* a_value = a->value;
assert(a_value->type == json_type_true);
assert(a_value->payload == NULL);
struct json_object_element_s* b = a->next;
assert(b->next == NULL);
struct json_string_s* b_name = b->name;
assert(0 == strcmp(b_name->string, "b"));
assert(b_name->string_size == strlen("b"));
struct json_value_s* b_value = b->value;
assert(b_value->type == json_type_array);
struct json_array_s* array = (struct json_array_s*)b_value->payload;
assert(array->length == 3);
struct json_array_element_s* b_1st = array->start;
struct json_value_s* b_1st_value = b_1st->value;
assert(b_1st_value->type == json_type_false);
assert(b_1st_value->payload == NULL);
struct json_array_element_s* b_2nd = b_1st->next;
struct json_value_s* b_2nd_value = b_2nd->value;
assert(b_2nd_value->type == json_type_null);
assert(b_2nd_value->payload == NULL);
struct json_array_element_s* b_3rd = b_2nd->next;
assert(b_3rd->next == NULL);
struct json_value_s* b_3rd_value = b_3rd->value;
assert(b_3rd_value->type == json_type_string);
struct json_string_s* string = (struct json_string_s*)b_3rd_value->payload;
assert(0 == strcmp(string->string, "foo"));
assert(string->string_size == strlen("foo"));
/* Don't forget to free the one allocation! */
free(root);
}
{
const char json[] = "{\"a\" : true, \"b\" : [false, null, \"foo\"]}";
struct json_value_s* root = json_parse(json, strlen(json));
struct json_object_s* object = json_value_as_object(root);
assert(object != NULL);
assert(object->length == 2);
struct json_object_element_s* a = object->start;
struct json_string_s* a_name = a->name;
assert(0 == strcmp(a_name->string, "a"));
assert(a_name->string_size == strlen("a"));
struct json_value_s* a_value = a->value;
assert(json_value_is_true(a_value));
struct json_object_element_s* b = a->next;
assert(b->next == NULL);
struct json_string_s* b_name = b->name;
assert(0 == strcmp(b_name->string, "b"));
assert(b_name->string_size == strlen("b"));
struct json_array_s* array = json_value_as_array(b->value);
assert(array->length == 3);
struct json_array_element_s* b_1st = array->start;
struct json_value_s* b_1st_value = b_1st->value;
assert(json_value_is_false(b_1st_value));
struct json_array_element_s* b_2nd = b_1st->next;
struct json_value_s* b_2nd_value = b_2nd->value;
assert(json_value_is_null(b_2nd_value));
struct json_array_element_s* b_3rd = b_2nd->next;
assert(b_3rd->next == NULL);
struct json_string_s* string = json_value_as_string(b_3rd->value);
assert(string != NULL);
assert(0 == strcmp(string->string, "foo"));
assert(string->string_size == strlen("foo"));
/* Don't forget to free the one allocation! */
free(root);
}
I wanted to keep normal assert.h
assert’s in the sample source, but I also
want these to be turned into my utest.h ASSERT_TRUE
macros, so I just use the
preprocessor to define these over, and include the source into the test:
#define assert(x) ASSERT_TRUE(x)
UTEST(generated, readme) {
#include "generated.h"
}
And the output when I run?
[ RUN ] generated.readme
[ OK ] generated.readme (9715ns)
A pass!
Conclusion⌗
Ok - it is not as nice as what Rust has built-in, but it works! I can now modify the README.md and be sure that the code compiles correctly. I even found a bug in the sample in the process, so well worth the work. Just a shame I had to invest in proper demonology to support this within the C eco-system.