|
|
Log in / Subscribe / Register

Working with UTF-8 in the kernel

Working with UTF-8 in the kernel

Posted Apr 8, 2019 21:18 UTC (Mon) by foom (subscriber, #14868)
In reply to: Working with UTF-8 in the kernel by dvdeug
Parent article: Working with UTF-8 in the kernel

> Fortunately, there are rules for locale-insensitive case-folding, and they aren't random or arbitrary.

That may be, but FAT, exFAT, and NTFS don't use the unicode case folding rules. If the justification is to make something compatible with those systems, do we actually need the (rather complex) unicode rules?


to post comments

Working with UTF-8 in the kernel

Posted Apr 8, 2019 23:30 UTC (Mon) by dvdeug (subscriber, #10998) [Link] (1 responses)

What rules do they use?

In what way are the Unicode case-folding rules rather complex? They are for the most part fairly simple, one to one matchings of characters, with a few exceptions that you just have to deal with. The German ß and the various titlecase characters in Unicode are there and are going to have to be dealt with.

Working with UTF-8 in the kernel

Posted Apr 9, 2019 15:35 UTC (Tue) by foom (subscriber, #14868) [Link]

NTFS and exFAT only maps a single utf16 code unit to another single utf16 code unit, via a lookup table written to disk during filesystem creation. No unicode normalization, no multicharacter equivalencies, and no folding for any characters above FFFF.

You say that other cases "have to be dealt with"...but we have widely used examples showing that to not actually be the case.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds