User: Password:
|
|
Subscribe / Log in / New account

Moving to Python 3

Moving to Python 3

Posted Feb 10, 2011 14:39 UTC (Thu) by Webexcess (guest, #197)
In reply to: Moving to Python 3 by nevyn
Parent article: Moving to Python 3

You didn't provide a link for the problem, but is this what you're looking for?

    os.listdir(b'.') # no decoding for me, thanks


(Log in to post comments)

Moving to Python 3

Posted Feb 10, 2011 15:38 UTC (Thu) by nevyn (guest, #33129) [Link]

I'm aware that you can call it (directly) as os.listdir(bytes(mypath)), but this has a number of problems:

1. When calling listdir() directly, the default is broken (and in a non-obvious way) ... so everybody has to remember "Oh, yeh, you have to call os.listdir() in this speciail way or it's broken".

2. It assumes people are calling os.listdir() directly ... which is _far_ from the normal case. So now, to do the same hack, every API that eventually calls listdir() will have to implement/debug the bytes vs. unicode input vs. output thing ... and every caller of those APIs will have to remember "Oh, yeh, you have to call foo_API() in this speciail way or it's broken".

3. It's still not obvious what you _do_ with those bytes, because the reason listdir() doesn't work "normally" is that it's model of the Universe doesn't match reality. Basically you can't load a POSIX filename, and print "Error: open(%s): %s" ... and this problem is much bigger than POSIX filenames, it's just that's the most glaringly broken problem that people see. So the whole thing is a huge clue that "Unicode" is not any better in py-3 than it is in py-2 (which is to say, it's completely broken).

Moving to Python 3

Posted Feb 12, 2011 0:13 UTC (Sat) by cmccabe (guest, #60281) [Link]

There was a thread about non-UTF8 filenames on LWN a little while back. (One of many, I'm sure.) The consensus seemed to be that they were quite useless. They tend to break the mental model of programmers too. For example, programmers tend to assume that printing a filename to stdout is *not* a security vulnerability. But if that filename contains control characters... surprise! It can hack your terminal emulator.

Python has a pretty long history of "forcing" what it believes to be the correct behavior on its users. It even tells you how to use whitespace. I am not surprised at all that they ignore non-UTF filenames. Frankly, it's a good decision.

Moving to Python 3

Posted Feb 12, 2011 1:50 UTC (Sat) by foom (subscriber, #14868) [Link]

They don't ignore random-byte filenames. Filenames are decoded from bytes to unicode with the *locale encoding* (not always utf8), and the "surrogateescape" error handler. That allows roundtripping filenames through unicode even if they're not in the proper encoding at all (although in that case they'll be garbage).

http://www.python.org/dev/peps/pep-0383/

Moving to Python 3

Posted Feb 15, 2011 1:24 UTC (Tue) by yuhong (guest, #57183) [Link]

Yea, it is not Python's fault that historically there has been no standard character encoding beyond ASCII for Unix filenames, in contrast to Windows LFN filenames and Mac HFS+ filenames, both of which used UTF-16 from the beginning.

Moving to Python 3

Posted Feb 15, 2011 14:32 UTC (Tue) by nevyn (guest, #33129) [Link]

> Yea, it is not Python's fault [that unix doesn't look like windows]

It is exactly python's fault that it pretends unix is like windows, when it isn't.

Moving to Python 3

Posted Feb 15, 2011 14:52 UTC (Tue) by foom (subscriber, #14868) [Link]

> It is exactly python's fault that it pretends unix is like windows, when it isn't.

Except that python doesn't actually do that, see comment above...


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds