Since my last post introducing utf8.h I’ve been frantically working on fleshing out the core utf8* functions to match the str* ones, and also listening to developer feedback!

Firstly, you can check out the one header C/C++ library here - utf8.h.

  • @daniel_collin suggested adding an ASCII only utf8casecmp, which has been added. I’m looking into extending this to support more of the characters in Unicode (the most obvious ones that I can understand are ASCII characters with accents).
  • @mcclure111 suggested I actually document the code where appropriate, and I’ve undertake efforts to remedy this.

Next up I plan to tackle the utf8canon that @KmBenzie suggested, to canonicalize poorly formed utf8 codepoints into the correct form (for example, utf8 ascii values can be encoded erroneously in a 4-byte codepoint which is regarded as poor form).