I tried (and failed) to convince the LLVM folks to allow for runtime togglable asserts. No biggie - the people much more involved with maintaining upstream didn’t want to have yet another codepath to maintain. They said that it’d cost 20% runtime performance to have asserts enabled, and that this cost would likely still be paid even if I did have asserts compiled off at runtime. Why you might ask - the answer to which is that LLVM guards a bunch of assert-checking code behind NDEBUG preprocessor checks, and will store extra sideband information if you are running with asserts enabled so as to do some deeper checks.
I was talking with someone today that really really wanted the sqrtps to be used in some code they were writing. And because of a quirk with clang (still there as of clang 18.1.0), if you happened to use -ffast-math clang would butcher the use of the intrinsic. So for the code:
__m128 test(const __m128 vec) { return _mm_sqrt_ps(vec); } Clang would compile it correctly without fast-math:
test: # @test sqrtps xmm0, xmm0 ret And create this monstrosity with -ffast-math:
When I interviewed for Epic Games it was for a graphics post - I wanted back to working on shader compilers. But even though most of my interviews were from the fantastic graphics side of the company, I had a few interviews about something I knew very little about - the Verse language. And on one of those interviews I was asked about something I hadn’t thought about for 15 years - Transactional Memory.
I’ve updated my C/C++ open sources libraries utest.h, utf8.h, ubench.h, hashmap.h, subprocess.h, and json.h to use the new Apple Silicon GitHub CI runners.
So how hard is it? Simple! You just add macos-14 to the build -> strategy -> matrix. I took the opportunity to drop macos-latest (which is still set to macos-13, the last x86 runner) and explicity use the oldest supported macos-11 instead.
The new Apple Silicon runner is roughly 2x faster than the x86 one too - nice!
So I got my Raspberry PI 5. And like in my previous post, I compiled LLVM 17 on the Raspberry PI 5, and have compared the compile speed versus the Raspberry PI 4.
I’m going to do the same steps:
Compile LLVM using the default clang got via apt-get. Compile LLVM again using the clang we just built. And compile it a third time with the clang we built with our own clang (this step should be the most accurate picture of the difference in performance between the 4 and 5, because it should be the same binary compiling the same project).