utest.h Now Supports Whole Program Optimization
A user of utest.h posted an intruiging issue that on an release build with Visual Studio my unit testing framework for C/C++, utest.h, no tests were being ran. But on a debug build they were. Before I even got to the issue they worked out that whole program optimization was causing the issue and closed it themselves.
But this got me thinking - there is no reason why my library shouldn’t work with whole program optimization enabled, so why was this behaving badly?
Visual Studio’s Whole Program Optimization⌗
To enable whole program optimization in Visual Studio requires the /GL
option on the cl.exe
compiler, and /LTCG
on the link.exe
. What this does is effectively defer some of the optimizations and optimization ability to link time. At link time it can then perform optimizations across multiple individual compilation units to (in theory!) result in a more optimal resulting executable.
In CMake you can enable this for an executable like so:
if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC")
target_compile_options(utest_test_wpo PRIVATE "$<$<CONFIG:RELEASE>:/GL>")
target_link_options(utest_test_wpo PRIVATE "$<$<CONFIG:RELEASE>:/LTCG>")
endif()
What the above does is turn on the option only for release builds, for the given target. The first thing I did when trying to work out this issue was to duplicate the existing tests I had for utest, and build a separate target utest_test_wpo
. I used the above CMake to turn on the whole program optimization, configured CMake to use release and got the following output:
[==========] Running 0 test cases.
[==========] 0 test cases ran.
[ PASSED ] 0 tests.
So this matched what the user found at least!
The Problem⌗
So the issue boils down to a clever little trick I use in utest.h - how it discovers test. Now for those that aren’t aware my utest.h library looks for tests in a specific format:
UTEST(from, c) {
ASSERT_TRUE(1);
}
And allows you to run them. This works across C or C++ files - if you have T
tests spread across F
files, when you run the produced executable it’ll run all T
tests - even though they started in separate files. I don’t use any custom magic to make this work - so how can I cross compilation units to work out what to run?
I use an idiom that is awful and awesome in equal measures - global constructors. For those used to C++ you’ll be aware that global variables can have constructors to initialize their state. It is general wisdom among those of us burned by the feature to either a) not use global variable constructors at all, or b) if you do make them dumb as possible. This is because the order that all global variables will be initialized is undefined - and thus any interdependency between these variables could cause problems.
What is less well known is that the three major C compilers have ways to add global constructors in C code.
Clang/GCC both support the following:
static void f(void) __attribute__((constructor));
You just tag a function as being a constructor
using the attribute mechanism, and it’ll be run during the globals initialization phase.
MSVC (Visual Studio) is a little more complicated, but you can achieve the same thing with:
static void __cdecl f(void);
__declspec(allocate(".CRT$XCU")) void(__cdecl * f_)(void) = f;
What this does is add a function pointer to a special section .CRT$XCU
- where all the global initializers are placed. On startup the compiler will run through this list of function pointers to initialize all the global state. It’s aesthetically uglier than the Clang/GCC approach, but it works!
With this key piece of technology we can then use global constructors for each of the UTEST
tests to register themselves with a global registry of ’tests that can be run’, effectively just appending themselves to a malloc’ed region. And this is fast - on my 2017 MacBook Pro it takes around 50ms to initialize 502 tests for running, around 99us per test. As an aside, I did an expose of the performance of utest.h and TL;DR it’s blindingly fast.
Back to the problem - remember the problem? MSVC with whole program optimization was not running any tests? So for all intents and purposes these global constructors, the function pointer f_
above, are not actually used in my code. What I mean is that I don’t ever manually call f_
myself, it is just registered and shoved in the global initializer section. So under whole program optimization the compiler comes along and says ‘Oh hey! All these variables are doing stuff and we don’t use them? Get rid!’, meaning that none of the global constructors are then ran, and as a result none of the tests will register themselves with the global test registry and so it looks like there is nothing to run.
Pretty smart of the compiler in some respects, but now we need to defeat it.
The Solution⌗
I was recently made aware of a funky feature of the MSVC compiler thanks to an investigation I did into using rpmalloc with LLVM to improve compile times on multithreaded workloads on Windows. @maniccoder, who knows a thing or two about allocators, gave me this gist on how to do it. The part that was new to me though was:
// Make sure symbols are not purged in linker
#pragma comment(linker, "/include:_rpmllinit_")
So you can use a special MSVC specific pragma to pass options to the linker, and those options can include a special option to always preserve a symbol even if the compiler thinks it is dead. Seems relevant! Now since my UTEST
test fixtures use the preprocessor, I cannot use the #pragma
variant because it would be invoked at the wrong place. But I knew I could use the __pragma()
variant to do the same thing within the preprocessed macro’s body.
So my global test initializer before this change was:
#define UTEST_INITIALIZER(f) \
static void __cdecl f(void); \
__declspec(allocate(".CRT$XCU")) void(__cdecl * f##_)(void) = f; \
static void __cdecl f(void)
And so I just added the __pragma()
line to preserve the symbol:
#define UTEST_INITIALIZER(f) \
static void __cdecl f(void); \
__pragma(comment(linker, "/include:" #f "_")); \
__declspec(allocate(".CRT$XCU")) void(__cdecl * f##_)(void) = f; \
static void __cdecl f(void)
And it worked - kinda. So the above worked for all global initializers from tests in a C function compiled for Win64 - 64-bit. It failed for 32-bit executable builds, and for C++ code.
So starting with the C++ tests I realised that even though I’ve compiled the f_
function pointer with the __cdecl
calling convention, I hadn’t actually stopped the compiler mangling the variable name which would change the symbol name.
So to fix this I added another macro:
#if defined(__cplusplus)
#define UTEST_C_FUNC extern "C"
#else
#define UTEST_C_FUNC
#endif
That uses extern "C"
when C++ is being used, and then modified the initializer to:
#define UTEST_INITIALIZER(f) \
static void __cdecl f(void); \
__pragma(comment(linker, "/include:" #f "_")); \
UTEST_C_FUNC __declspec(allocate(".CRT$XCU")) void(__cdecl * f##_)(void) = f;\
static void __cdecl f(void)
Note that it doesn’t matter what the signature of the f
function itself is, only that the function pointer global variable does not end up with a mangled name. This fixed the C++ specific link failure.
For the 32-bit windows it was a little funky - but I had a vague memory in my head of dealing with this in a previous life. Basically 32-bit symbols are prepended with an extra underscore _
on the symbol name. I don’t actually know why this is the case to be honest, but I’ve hit this issue before. So to fix that I added:
#if defined(_WIN64)
#define UTEST_SYMBOL_PREFIX
#else
#define UTEST_SYMBOL_PREFIX "_"
#endif
Which will have an additional underscore only on 32-bit builds using MSVC, and then changed the initializer to:
#define UTEST_INITIALIZER(f) \
static void __cdecl f(void); \
__pragma(comment(linker, "/include:" UTEST_SYMBOL_PREFIX #f "_")); \
UTEST_C_FUNC __declspec(allocate(".CRT$XCU")) void(__cdecl * f##_)(void) = f;\
static void __cdecl f(void)
Which fixed all my build problems. Now utest.h supports whole program optimization correctly.
[==========] Running 502 test cases.
[ RUN ] utest_cmdline.filter_with_list
[ OK ] utest_cmdline.filter_with_list (4155002ns)
[ RUN ] c.ASSERT_TRUE
[ OK ] c.ASSERT_TRUE (77ns)
[ RUN ] c.ASSERT_FALSE
[ OK ] c.ASSERT_FALSE (53ns)
...
[==========] 502 test cases ran.
[ PASSED ] 502 tests.