Filesystems and case-insensitivity
Filesystems and case-insensitivity
Posted Nov 28, 2018 16:11 UTC (Wed) by willy (subscriber, #9762)Parent article: Filesystems and case-insensitivity
Chinese characters use a 3-byte encoding, not 4. The CJK ideographs are U+4E00 to U+9FFF.
There are Extended blocks in U+20000 space which will use 4 bytes, but my understanding is that those are rare characters (the most common 27,000 characters are below FFFF).
The language groups who were worst affected by UTF-8 were Cyrillic and Greek who now need two bytes for every letter. But I don't see what better choice there was.
