Reasons for speedup?
Reasons for speedup?
Posted Feb 12, 2025 19:49 UTC (Wed) by excors (subscriber, #95769)In reply to: Reasons for speedup? by jreiser
Parent article: Rewriting essential Linux packages in Rust
With LANG=C.UTF-8, coreutils spends most of its time in strcoll_l, and it sorts by what I presume is some Unicode collation algorithm.
As far as I can see, uutils has no locale support. It aborts if the input is not valid UTF-8 ("sort: invalid utf-8 sequence of 1 bytes from index 0"). It simply sorts by byte values (equivalent to sorting by codepoint), regardless of LANG.
So in this case it's only faster because it doesn't implement Unicode collation.
Posted Feb 12, 2025 20:08 UTC (Wed)
by excors (subscriber, #95769)
[Link] (1 responses)
Posted Feb 18, 2025 3:10 UTC (Tue)
by ehiggs (subscriber, #90713)
[Link]
Regardless it should do Unicode canonicalization or it will miss sort depending on how different runes are composed. This is fine for diacritic free languages like English but as soon as you get some diacritics then LANG=C's naïve handling of text breaks in my experience.
Reasons for speedup?
Reasons for speedup?
