I was talking with someone today that really really wanted the sqrtps to be used in some code they were writing. And because of a quirk with clang (still there as of clang 18.1.0), if you happened to use -ffast-math clang would butcher the use of the intrinsic. So for the code:
__m128 test(const __m128 vec) { return _mm_sqrt_ps(vec); } Clang would compile it correctly without fast-math:
test: # @test sqrtps xmm0, xmm0 ret And create this monstrosity with -ffast-math:
When I interviewed for Epic Games it was for a graphics post - I wanted back to working on shader compilers. But even though most of my interviews were from the fantastic graphics side of the company, I had a few interviews about something I knew very little about - the Verse language. And on one of those interviews I was asked about something I hadn’t thought about for 15 years - Transactional Memory.
I’ve updated my C/C++ open sources libraries utest.h, utf8.h, ubench.h, hashmap.h, subprocess.h, and json.h to use the new Apple Silicon GitHub CI runners.
So how hard is it? Simple! You just add macos-14 to the build -> strategy -> matrix. I took the opportunity to drop macos-latest (which is still set to macos-13, the last x86 runner) and explicity use the oldest supported macos-11 instead.
The new Apple Silicon runner is roughly 2x faster than the x86 one too - nice!
So I got my Raspberry PI 5. And like in my previous post, I compiled LLVM 17 on the Raspberry PI 5, and have compared the compile speed versus the Raspberry PI 4.
I’m going to do the same steps:
Compile LLVM using the default clang got via apt-get. Compile LLVM again using the clang we just built. And compile it a third time with the clang we built with our own clang (this step should be the most accurate picture of the difference in performance between the 4 and 5, because it should be the same binary compiling the same project).
With the imminent launch of the Raspberry Pi 5 I wondered - how long does it take to compile the latest LLVM release (17 at the time of writing this blog) on the Raspberry Pi 4. This will give me a baseline that I can test the Raspberry Pi 5 against once I get ahold of it.
For my initial exploration I decided to test just three things:
Using the stock Clang compiler I could get at via apt get to compile LLVM.