User: Password:
|
|
Subscribe / Log in / New account

Of bytes and encoded strings

Of bytes and encoded strings

Posted Jan 23, 2014 12:07 UTC (Thu) by tialaramex (subscriber, #21167)
In reply to: Of bytes and encoded strings by Tara_Li
Parent article: Of bytes and encoded strings

It was never simple. I mean, it looks simple to someone already versed in ASCII text systems so long as you don't mind being hopelessly wrong for everything outside basic Latin for a US centric locale. So, for a lot of English-only users, and indeed for much of Western Europe, it's sort-of fine, they already tolerate such nonsense elsewhere.

But yeah, not simple. You don't get to blame anybody still alive because the lack of simplicity is present in the writing systems which are centuries old, and the whole _point_ is to represent existing writing systems. If you don't need the existing writing systems then the whole ball of wax is irrelevant, starting with ASCII.

Sometimes people say "But everything works for Latin, clearly these other writing systems are garbage". There is a tiny speck of truth here, perhaps the Chinese writing system (Note: not the topolects) will die out in the next century or so because it is so much clumsier than an alphabet (Chinese academics vary both in the extent to which they think this likely, and whether its desirable). But mostly this is false because it misses how incredibly distorted existing text APIs are as a result of Latin. Case shifting and insensitivity? Wacky assumptions about character width? The idea of "punctuation"? Several distinct yet largely interchangeable kinds of "white space"? Blame Latin.


(Log in to post comments)

Of bytes and encoded strings

Posted Jan 23, 2014 12:20 UTC (Thu) by anselm (subscriber, #2796) [Link]

Blame Latin.

It could be worse. It could be Arabic. Or Khmer.

Of bytes and encoded strings

Posted Jan 23, 2014 21:15 UTC (Thu) by hummassa (subscriber, #307) [Link]

I understand that khmer alphabet is actually an alphasillabary, but why would Arabic be worse?

Of bytes and encoded strings

Posted Jan 23, 2014 21:29 UTC (Thu) by anselm (subscriber, #2796) [Link]

Different letter shapes depending on where in the word you are (think »ſ on steroids«)? No proper vowels? Makes Latin script look almost straightforward by comparison :^)

Of bytes and encoded strings

Posted Jan 25, 2014 18:23 UTC (Sat) by jengelh (guest, #33263) [Link]

Store it as splines, that'll "fix" it :D

Re: Different letter shapes depending on where in the word

Posted Jan 31, 2014 7:46 UTC (Fri) by ldo (guest, #40946) [Link]

Those different letter shapes are a rendering issue, not an encoding issue. Treating them as an encoding issue is a recipe for madness. It is avoided by all sensible text-handling systems.

Of bytes and encoded strings

Posted Jan 23, 2014 13:11 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

Punctuation is older than the Latin alphabet, and the notion of case distinctions appears to have developed in the Greek alphabet at about the same time and for much the same reasons as it did in the Latin alphabet.

Of bytes and encoded strings

Posted Jan 23, 2014 21:42 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Sure, sorry it wasn't my intention to say that Latin is responsible for introducing these crazy requirements _to the world_ but only that Latin set the requirements for early computer systems.

And further I mean Latin as-in the modern writing system derived from ancient Latin and used by many European languages today, not the actual ancient Roman writing system or the dead language.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds