A report from the documentation maintainer
A report from the documentation maintainer
Posted Nov 3, 2016 19:07 UTC (Thu) by nybble41 (subscriber, #55106)In reply to: A report from the documentation maintainer by mstone_
Parent article: A report from the documentation maintainer
The instructions provided for setting the character encoding, and the defaults for systems configured to use UTF-8, always seem to include the language as well as the character encoding. This bundling of language and encoding and collation and other preferences into a single global setting is the root of the problem. The same environment variable controls both, unless you set yet another variable to selectively override the sorting rules. It doesn't make sense to use the same settings everywhere, and pattern-matching in the shell programming language is one of those areas where a dependency on the current locale makes no sense.
The fact that the user wants to see messages in English (or whatever language) and use UTF-8 for character encoding should not be taken to imply that they want to change the way the shell expands glob patterns.
As I said before, collation order for user-visible output is not really the point. Sometimes the correctness of a script does depend on the sort order internally (such as *.d directories), but typically the files involved are defined to start with ASCII digits and case issues consequently do not apply (given reasonable locale definitions). Personally I would be content with an option to sort by a file type, case, filename triplet which otherwise followed the locale-specific rules. Absent that option, byte-order sorting with LC_COLLATE=C gives the desired behavior in 99.9% of the cases I am ever likely to encounter. YMMV.
> At any rate, the shell isn't a programming language, it's a user interface.
Shell is a user interface in the same sense as any programming language: a user interface designed for use by programmers. It's ridiculous to claim that it isn't a programming language, since significant fraction of the programs on most Unix systems are written in it. By percentage of commands executed it's a programming language first, with interactive use as a distant second.
> If you don't like this, it's more productive to choose a different language than to rail against reality. Specifically you need something with better defined semantics for pattern matching, probably something with the knobs in the API call itself rather than a bunch of environment variables which alter the semantics in unexpected and non-obvious ways.
What do you think this thread was about? This "better language" you suggest is shell, as it existed before glob patterns became locale-aware and thus context-dependent and dangerously unpredictable. Now the only safe way to write a Bash shell script with glob patterns is by setting the globasciiranges option within the script. For system scripts which can't be fixed the only option is to force LC_COLLATE=C.
