Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Posted Oct 16, 2023 6:41 UTC (Mon) by wtarreau (subscriber, #51152)In reply to: Recent improvements in GCC diagnostics by dvdeug
Parent article: Recent improvements in GCC diagnostics
Yes but this was already well known. All of us coming from the DOS world were used to seeing 1-for-1 replacement. I was even used to reading a "é" when it was written "Ä" on screen. The problem with UTF-8
is the variable size that breaks when facing unexpected sequences, particularly the rollback since it was decided that it was probably robust enough to support backspace instead of storing it into a buffer. As a result the linux terminal itself is broken. Just boot on a console with init=/bin/sh, set your locale to latin1, press "é" then backspace and discover how you eat the prompt. I mentioned this 10+ years ago already and was told "we know but it would be difficult to do better"...
> Do you want to view changelogs on Debian?
I don't, but there are way less problems reading UTF-8 on ISO than the opposite, because at worst I get a few chars I don't care about and that's all, which is much better than invisible chars remaining stuck in the middle of nowhere, the invisible non-breakable space that some mistakenly insert in their command lines using alt+space that breaks their command-lines, RTL stuff that makes your cursor go wild when editing a line etc.
Don't get me wrong, I do understand that some other languages need more bits to store their characters, I just don't like the huge abuse that's being made by replacing standard chars with new ones that don't bring any value, or even emojis (since when a character needs to contain other colors than the font ones?).
Posted Oct 16, 2023 12:36 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Since people want to be able to express themselves in ways that culture has made common. Unicode is way more descriptive than prescriptive and that's for the best IMNSHO. IRC had :) and whatnot. With more pixels available, people would obviously want to do more too. I'm not the greatest fan of emoji, but it is far better than slinging raw images around.
Posted Oct 16, 2023 14:32 UTC (Mon)
by dvdeug (guest, #10998)
[Link] (1 responses)
Which is Unix's responsibility; had Microsoft had their way, we'd be using UTF-16.
> I do understand that some other languages need more bits to store their characters, I just don't like the huge abuse that's being made by ...
That's a cop-out. None of the complaints above have anything to do with emoji. They all have to do with the inevitable problem with having more bits and both languages that are right-to-left and left-to-right. There's nothing any solution could have done much better in that sense. Either we have a constant length code of 16 or 32 bits, or we have a variable length code like UTF-8, or we have a codepage switching mechanism (all of which have supported CJK have also been variable length; a single byte codepage switching mechanism would be horribly inefficient for Chinese).
Posted Oct 16, 2023 15:38 UTC (Mon)
by rschroev (subscriber, #4164)
[Link]
Even with the fixed-length UTF-32 there is the fact that glyphs are often composed of multiple code points.
None of this is the responsibility of Unix. It's just the consequence of the complexity of human language.
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics