Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Posted Jan 17, 2020 15:45 UTC (Fri) by anselm (subscriber, #2796)In reply to: Szorc: Mercurial's Journey to and Reflections on Python 3 by marcH
Parent article: Szorc: Mercurial's Journey to and Reflections on Python 3
These files could be created on a KOI-8-only partition and their names automatically converted when copied out of it?
Technically there is no such thing as a “KOI-8-only partition” because Linux file systems don't care about character encoding in the first place. Of course you can establish a convention among the users of your system(s) that a certain directory (or set of directories) contains files with KOI-8-encoded names; it doesn't need to be a whole partition. But you will have to remember which is which because Linux isn't going to help you keep track.
Of course there's always convmv to convert file names from one encoding to another, and presumably someone could come up with a clever method to overlay-mount a directory with file names known to be in encoding X so that they appear as if they were in encoding Y. But arguably in the year 2020 the method of choice is to move all file names over to UTF-8 and be done (and fix or replace old software that insists on using a legacy encoding). It's also worth remembering that many legacy encodings are proper supersets of ASCII, so people who anticipate that their files will be processed on an UTF-8-based system could simply, out of basic courtesy and professionalism, stick to the POSIX portable-filename character set and save their colleagues the hassle of having to do conversions.
Posted Jan 17, 2020 16:35 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (6 responses)
How do you know they use Linux? Even if they do, they could/should still use VFAT on Linux which does have iocharset, codepage and what not.
And now case insensitivity even - much trickier than filename encoding.
Or NTFS maybe.
Posted Jan 17, 2020 16:51 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
There was also DOS (original and alternative) and ISO code pages, but they were rarely used.
Posted Jan 17, 2020 17:35 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (4 responses)
So how did Linux and Windows users exchange files in Russia? Not?
The question of what software layer should force users to explicit the encodings they use is not obvious, I think we can all agree to disagree on where. If it's enforced "too low" it breaks too many use cases. Enforcing it "too high" is like not enforcing it at all. In any case I'm glad "something" is breaking stuff and forcing people to start cleaning up "bag of bytes" filename messes.
Posted Jan 17, 2020 17:49 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
At this time most often used versions of Windows (95 and 98) also didn't support Unicode, adding to the problem.
This was mostly fixed by the late 2000-s with the advent of UTF-8 and Windows versions with UCS-2 support.
However, I still have a historic CVS repo with KOI-8 names in it. So it's pretty clear that something like Mercurial needs to support these niche users.
Posted Jan 17, 2020 18:06 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (2 responses)
A cleanup flag day is IMHO the best trade off.
Posted Jan 18, 2020 22:40 UTC (Sat)
by togga (subscriber, #53103)
[Link] (1 responses)
Tradeoff for what? Giving an incalculable number of users problems for sticking with a broken language?
Posted Jan 18, 2020 22:48 UTC (Sat)
by marcH (subscriber, #57642)
[Link]
s/language/encodings/
This entire debate summarized in less than 25 characters. My pleasure.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
KOI-8 was the encoding widely used in Linux for Russian language. Win1251 was used in Windows.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Using codepage converters. But it was so bad that by early 2000-s all the browsers supported automatic encoding detection, using frequency analysis to guess the code page.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3