Filesystems and case-insensitivity
Filesystems and case-insensitivity
Posted Nov 29, 2018 8:30 UTC (Thu) by nim-nim (subscriber, #34454)Parent article: Filesystems and case-insensitivity
Anyway, hope this gets fixed. Transition to UTF-8 was awful for *x filesystems, I sure hope there won't be a v2 with wide encoding problems added to the mix whenever UTF-8 gets deprecated in favour of something better.
Posted Nov 29, 2018 9:15 UTC (Thu)
by dgm (subscriber, #49227)
[Link] (8 responses)
Posted Nov 29, 2018 11:38 UTC (Thu)
by eru (subscriber, #2753)
[Link] (7 responses)
I would hope that is never. UTF-8 can represent all characters now in practical use. The main risk is designing emojis going totally out of hand, and they insist each of them should have a UNICODE code point... oh wait...
Posted Nov 29, 2018 12:41 UTC (Thu)
by chithanh (guest, #52801)
[Link] (6 responses)
That is not correct. In particular, Unicode (and by extension UTF-8) is deficient regarding some characters in African languages, due to the Unicode consortium's policy regarding precomposed characters vs. combining diacritics. They don't want to introduce new equivalences.
Posted May 29, 2019 23:00 UTC (Wed)
by Serentty (guest, #132335)
[Link] (5 responses)
Posted May 30, 2019 14:18 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (4 responses)
(How many primitives would you need for Chinese?)
On the other hand, in that case we wouldn't all use UTF-8 by now – simply because that would require twice the storage space for Chnese text, more or less. Nowadays that doesn't really matter, but at the time it was a problem.
Posted May 30, 2019 14:54 UTC (Thu)
by excors (subscriber, #95769)
[Link] (2 responses)
Maybe the 64K limit could have lasted for many more years if they had made some different design choices early on, but given the goal of being a universal standard for all text, it seems inevitable the limit would be broken eventually. It's better to have broken it earlier than later.
Posted May 31, 2019 15:06 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Seems that quite a few of Chinese people with interesting names (i.e. using archaic characters) suddenly couldn't get an official document any more because, surprise, their name wasn't in the "official" charset …
Posted May 31, 2019 18:37 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jun 6, 2019 5:09 UTC (Thu)
by Serentty (guest, #132335)
[Link]
Posted Nov 29, 2018 12:29 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (1 responses)
Before UTF-8, there never was an encoding that could represent "all non-English languages". At most it could store one other language, or ten (Windows and its brain-dead decision to use 16-bit characters), and that is a subset of Unicode/utf-8.
> whenever UTF-8 gets deprecated
It won't be. There's no reason at all to do that, and several billion reasons not to.
Posted Nov 29, 2018 12:43 UTC (Thu)
by eru (subscriber, #2753)
[Link]
To be fair, that was the UNICODE spec at the time. Similarly, Java originally used 16-bit characters (and a
Filesystems and case-insensitivity
*when* (not if) the next transition happens
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
Filesystems and case-insensitivity
(Windows and its brain-dead decision to use 16-bit characters), and that is a subset of Unicode/utf-8.
Filesystems and case-insensitivity
char
type is still 16 bits wide there). Now Java internally encodes strings as UTF-16 in order to support the expansion of UNICODE.