|
|
Log in / Subscribe / Register

Working with UTF-8 in the kernel

Working with UTF-8 in the kernel

Posted Apr 11, 2019 20:49 UTC (Thu) by Wol (subscriber, #4433)
In reply to: Working with UTF-8 in the kernel by cpitrat
Parent article: Working with UTF-8 in the kernel

Because, as I understand it, utf-16 is now seen to have been a mistake.

Forcing all filenames to be valid utf-16 will break quite a lot elsewhere ... I think that if you want to implement the utf universe properly in utf-16, you end up back with the 8-bit codeset mess, only bigger ...

Cheers,
Wol


to post comments

Working with UTF-8 in the kernel

Posted Apr 11, 2019 23:15 UTC (Thu) by foom (subscriber, #14868) [Link]

Er what? I don't really understand your comment, but NTFS doesn't implement utf-16.

It stores filenames as arbitrary sequences of 16-bit values. There are a few tens of values you cannot use (ascii control characters 0-31, and some ascii punctuation), but everything else is fair game. In particular, invalid utf16 containing broken surrogate pairs is perfectly fine.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds