|
|
Subscribe / Log in / New account

bytes vs. characters

bytes vs. characters

Posted Apr 23, 2015 17:00 UTC (Thu) by cesarb (subscriber, #6266)
In reply to: bytes vs. characters by lopgok
Parent article: Report from the Python Language Summit

> I would like a simple way in python 3 to be able to read the names of all the files in a directory. In python 3, it skips over some files which I suspect are not in the current codespace. In python 2, it just reads the names of all of the files.

I just tested here, and the python3 in this machine returns all filenames in os.listdir('.'), even the one I created with an invalid UTF-8 encoding.

Skipping over some files was true in Python 3.0 (https://docs.python.org/3/whatsnew/3.0.html#text-vs-data-...):

"Note that when os.listdir() returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising UnicodeError."

(The same paragraph mentions that you could still use os.listdir(b'.') to get all filenames as bytes, so even with Python 3.0 you already had a way to read the name of all the files.)

But that was probably changed in Python 3.1, when PEP 383 (https://www.python.org/dev/peps/pep-0383/) was implemented, since with it there are no "filenames that cannot be decoded properly".


to post comments

bytes vs. characters

Posted Apr 23, 2015 22:09 UTC (Thu) by lopgok (guest, #43164) [Link] (3 responses)

It is still broken with python 3 when I tested it about 2 or 3 months ago. It was either python 3.3 or python 3.4

I have a directory which is read just fine with python 2.7, but skips files with python 3.

bytes vs. characters

Posted Apr 24, 2015 11:44 UTC (Fri) by cesarb (subscriber, #6266) [Link] (2 responses)

Does it still skip files if you use the "bytes" interface (os.listdir(b'.'))?

I just took a quick look at the current Python source code for os.listdir (https://hg.python.org/cpython/file/151cab576cab/Modules/p...), and it only has code to skip the "." and ".." entries, as it's documented to do. In both the "str" and the "bytes" case, it adds every entry other than these two. For it to skip anything else on os.listdir, readdir() from glibc has to be skipping it, and it should affect more than just Python.

Or is the problem with something other than os.listdir?

bytes vs. characters

Posted Apr 24, 2015 14:18 UTC (Fri) by lopgok (guest, #43164) [Link] (1 responses)

It is os.listdir. I have not tried accessing it in binary yet.

I do find it odd that the OS can list the file and I can manipulate the file name on the command line, but because it has some odd characters in it, python silently skips over it.

bytes vs. characters

Posted May 9, 2015 21:34 UTC (Sat) by nix (subscriber, #2304) [Link]

Well, if you want to read something no matter what its encoding, you use bytes mode. That's what bytes mode is *for*. Python 3 is really very consistent here (unlike Python 2, for which you had to guess and hope.)


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds