> Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust.
> Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust.
Posted Sep 23, 2024 19:07 UTC (Mon) by NYKevin (subscriber, #129325)In reply to: > Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust. by adobriyan
Parent article: Resources for learning Rust for kernel development
The basic reason this is painful is because:
1. You are trying to round-trip possibly-invalid Unicode...
2. ...and (presumably) do some string transformations on it that may not be reasonably applicable to invalid Unicode...
3. ...and then you want to impose a restriction on the output format (no embedded nulls) that was not applied to the input (OsString does not prohibit embedded nulls, since it can be constructed directly from String without checking for nulls).
Of course this is all doable, but if you want it to function correctly, you are going to have to stop and think a little bit about what "function correctly" even means in this context. Frankly, you are either going to have pain up-front or pain later (when it does something subtly wrong), no matter what language you use for this. Rust is a little unusual in that it forces you to have that pain up-front instead of later, but that's arguably the whole point of using Rust.
Anyway, I would also point out that Rust does provide slice::utf8_chunks(), which makes this at least somewhat practical (if a little fiddly). See https://doc.rust-lang.org/stable/std/primitive.slice.html... for example code. Of course, if you're not in UTF-8, that's useless... but I'm not convinced this is even feasible in (most) encodings other than UTF-8 in the first place (UTF-8 is self-synchronizing, so you can "resume" decoding it after getting interrupted by invalid bytes, but most other encodings make no attempt to support that, e.g. if UTF-16 gets offset by one byte, then the whole rest of the string will be parsed incorrectly - and that's the easy one, Shift JIS is even worse in comparison since it reuses some single-byte code units as the second of a two-byte code sequence). You probably could do it in a legacy 8-bit encoding like ISO-8859-*, but meh, at that point you can just iterate one byte at a time anyway.
