One thing I’ve never really messed with before is the [[clang::musttail]] compiler attribute. Tail calls are when you exit a function by calling another function as the final thing within the function. For example:
int foo(float x); int bar(float x) { x += 42.0f; return foo(x); } In the above example, foo is a tail call within the function bar. Compilers can take advantage of this by doing something called tail-call optimization, which allows the compiler to not push a new stack frame, and also to change the call into a jump.
I tried (and failed) to convince the LLVM folks to allow for runtime togglable asserts. No biggie - the people much more involved with maintaining upstream didn’t want to have yet another codepath to maintain. They said that it’d cost 20% runtime performance to have asserts enabled, and that this cost would likely still be paid even if I did have asserts compiled off at runtime. Why you might ask - the answer to which is that LLVM guards a bunch of assert-checking code behind NDEBUG preprocessor checks, and will store extra sideband information if you are running with asserts enabled so as to do some deeper checks.
I was talking with someone today that really really wanted the sqrtps to be used in some code they were writing. And because of a quirk with clang (still there as of clang 18.1.0), if you happened to use -ffast-math clang would butcher the use of the intrinsic. So for the code:
__m128 test(const __m128 vec) { return _mm_sqrt_ps(vec); } Clang would compile it correctly without fast-math:
test: # @test sqrtps xmm0, xmm0 ret And create this monstrosity with -ffast-math:
When I interviewed for Epic Games it was for a graphics post - I wanted back to working on shader compilers. But even though most of my interviews were from the fantastic graphics side of the company, I had a few interviews about something I knew very little about - the Verse language. And on one of those interviews I was asked about something I hadn’t thought about for 15 years - Transactional Memory.
I’ve updated my C/C++ open sources libraries utest.h, utf8.h, ubench.h, hashmap.h, subprocess.h, and json.h to use the new Apple Silicon GitHub CI runners.
So how hard is it? Simple! You just add macos-14 to the build -> strategy -> matrix. I took the opportunity to drop macos-latest (which is still set to macos-13, the last x86 runner) and explicity use the oldest supported macos-11 instead.
The new Apple Silicon runner is roughly 2x faster than the x86 one too - nice!