|
|
Log in / Subscribe / Register

DeVault: Announcing the Hare programming language

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:41 UTC (Tue) by peter-b (guest, #66996)
In reply to: DeVault: Announcing the Hare programming language by bartoc
Parent article: DeVault: Announcing the Hare programming language

> C++ has char8_t, which is utf-8 or UB and also pointers to it don't alias char*s (or anything else).

To my enormous displeasure, unfortunately it isn't UB for a char8_t* to point to a string buffer that doesn't contain UTF-8.

> They are a complete and utter mess.

I can absolutely confirm this.


to post comments

DeVault: Announcing the Hare programming language

Posted May 3, 2022 11:07 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

> it isn't UB for a char8_t* to point to a string buffer that doesn't contain UTF-8.

One reason that comes to my mind is that it would need to eschew `u8ptr++` because if it currently points at a code point encoded with multiple bytes, incrementing one byte would make it UB, no? Or would `++` inspect the byte encoded length of the current code point and jump an appropriate amount? That certainly seems novel too.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 0:50 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Because both of the increment operators in C++ can be overridden for a type, this isn't difficult to do, if C++ wanted to do it.

But I don't expect C++ to actually do that lifting for the same reason it still has both these silly operators in the first place. Backward compatibility trumps all other considerations.

Rust's str actually provides both iterations, if you want the underlying *bytes* you can have those, and of course individual bytes are just UTF-8 code units and on their own don't necessarily mean anything specifically, but if you want "characters" (Rust's char) you can iterate over those and under the hood it is indeed moving forward the appropriate number of bytes each time.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds