DeVault: Announcing the Hare programming language
DeVault: Announcing the Hare programming language
Posted May 2, 2022 18:09 UTC (Mon) by excors (subscriber, #95769)In reply to: DeVault: Announcing the Hare programming language by excors
Parent article: DeVault: Announcing the Hare programming language
To be a bit more concrete here: If I create a file like "\xffdummy.txt", and try to read the filename with glob::glob or os::diropen, or pass it as a command-line argument, then Hare dies with:
> Abort: Assertion failed: /usr/src/hare/stdlib/strings/cstrings.ha:33:1
because of an "assert(utf8::valid(s));"
If I create a file like "\xf8dummy.txt", then utf8::valid (wrongly) thinks it is valid, so I can read the filename into a str. strings::iter says the first rune (aka Unicode codepoint) is U+935B6D. strings::riter says:
> Abort: /usr/src/hare/stdlib/strings/iter.ha:68:22: Invalid UTF-8 string (this should not happen)
At least it's not a memory-safety error in this case, though it wouldn't be surprising if some other function did non-bounds-checked accesses under the assumption that strings are UTF-8 (as promised by the specification).
Even when it just aborts, that seems unfortunate for a systems programming language - maybe you clone a Git repository with some non-UTF-8 filenames, and your 'ls' and 'rm' were written in Hare so now you can't see or delete the files.
I think the fundamental problem here is that if you want to build a language with Unicode strings, and use it to interact with external systems, you need a good way to handle strings that are not quite Unicode. C/C++/Go/etc just don't bother guaranteeting Unicode. Python 3 got really into Unicode before discovering it didn't work for filenames or lots of other real-world data, and bodged it with surrogateescape (so now Python's Unicode strings aren't Unicode) and with many duplicated APIs between str/bytes/bytearray. Rust uses its type system to provide Path/OsString/etc which handle non-Unicode strings safely, with trait-based conversions that mean the easy cases are still easy to write (File::open("foo.txt") etc) and explicit fallible/lossy conversions when you really need a Unicode string.
Hare wants Unicode strings (which I think is a good goal), but the standard library needs to provide an interface to the not-quite-Unicode real world, and I'm not sure if the language has enough features to ever implement it well.
