|
|
Log in / Subscribe / Register

DeVault: Announcing the Hare programming language

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:27 UTC (Mon) by ddevault (subscriber, #99589)
In reply to: DeVault: Announcing the Hare programming language by mpr22
Parent article: DeVault: Announcing the Hare programming language

I understand where you're coming from in this respect. Again, this was a difficult choice, and we may revisit it, but it simplifies the situation considerably for the 99.999% of use-cases where non-UTF-8 filenames are not present. In the remainder, it's very unlikely that anything worse than the program aborting will occur (e.g. overwriting such files). I would encourage you to present your case for this at the standard library design acceptance review committee (or the filesystem committee, a likely spin-off), which will be formed prior to 1.0.


to post comments

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:39 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (2 responses)

Didn't Python 3 already solve this problem with surrogateescape?

I'm not saying it's an elegant solution, or even necessarily a good solution, but it's less painful for the 99% case (as compared to using a tagged union), and it works for the 1% case, usually without losing any data (unless you're trying to manipulate filenames as strings, in which case any solution will probably be terrible anyway because there's just no good way to do that).

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:42 UTC (Mon) by ddevault (subscriber, #99589) [Link]

I mean, it's a complex set of trade-offs. To consider the Python approach, we'd have to untangle a pretty large can of worms having to do with string handling. We can't just lift surrogateescape wholesale - Python and Hare have very different goals and we have to think carefully through every implication of such a change so that the language remains consistent and reliable in its design throughout. And yes, it's an inelegant solution - and we prefer the elegant ones.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 20:40 UTC (Mon) by excors (subscriber, #95769) [Link]

surrogateescape means that code like print(*os.listdir()) can throw a UnicodeEncodeError. The programmer has to manually keep track of which values of the 'str' type are really Unicode and will work with all the standard string functions, and which are nearly-Unicode but will occasionally throw if you try to print or encode them like normal strings. It might be the best hack that's possible in Python given its compatibility constraints, but I think it creates as many problems as it solves. A statically typed language should be able to do better - tracking this kind of information about the range of values is what type systems are for.

At least if you get it wrong in Python, you'll probably just get an exception. In a non-memory-safe language, some code might rely on the promise that strings are Unicode and violating that could cause undefined behaviour; you need to either strictly enforce that promise, or not make the promise at all.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds