|
|
Log in / Subscribe / Register

Working with UTF-8 in the kernel

Working with UTF-8 in the kernel

Posted Apr 8, 2019 6:21 UTC (Mon) by cpitrat (subscriber, #116459)
In reply to: Working with UTF-8 in the kernel by foom
Parent article: Working with UTF-8 in the kernel

If the primary use case is to be compatible with NTFS, why not implement it the same way ? As I understand it, NTFS support will require a fake unicode version ?


to post comments

Working with UTF-8 in the kernel

Posted Apr 8, 2019 21:49 UTC (Mon) by foom (subscriber, #14868) [Link]

I don't know.

It does seem rather incongruous to me to justify the feature via by pointing to samba's emulation of NTFS case folding, and Android's emulation of FAT file name lookup rules, but then implementing unicode normalization and correct unicode case folding...which those don't do.

Working with UTF-8 in the kernel

Posted Apr 11, 2019 20:49 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

Because, as I understand it, utf-16 is now seen to have been a mistake.

Forcing all filenames to be valid utf-16 will break quite a lot elsewhere ... I think that if you want to implement the utf universe properly in utf-16, you end up back with the 8-bit codeset mess, only bigger ...

Cheers,
Wol

Working with UTF-8 in the kernel

Posted Apr 11, 2019 23:15 UTC (Thu) by foom (subscriber, #14868) [Link]

Er what? I don't really understand your comment, but NTFS doesn't implement utf-16.

It stores filenames as arbitrary sequences of 16-bit values. There are a few tens of values you cannot use (ascii control characters 0-31, and some ascii punctuation), but everything else is fair game. In particular, invalid utf16 containing broken surrogate pairs is perfectly fine.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds