bytes vs. characters
bytes vs. characters
Posted Apr 23, 2015 17:00 UTC (Thu) by cesarb (subscriber, #6266)In reply to: bytes vs. characters by lopgok
Parent article: Report from the Python Language Summit
I just tested here, and the python3 in this machine returns all filenames in os.listdir('.'), even the one I created with an invalid UTF-8 encoding.
Skipping over some files was true in Python 3.0 (https://docs.python.org/3/whatsnew/3.0.html#text-vs-data-...):
"Note that when os.listdir() returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising UnicodeError."
(The same paragraph mentions that you could still use os.listdir(b'.') to get all filenames as bytes, so even with Python 3.0 you already had a way to read the name of all the files.)
But that was probably changed in Python 3.1, when PEP 383 (https://www.python.org/dev/peps/pep-0383/) was implemented, since with it there are no "filenames that cannot be decoded properly".
Posted Apr 23, 2015 22:09 UTC (Thu)
by lopgok (guest, #43164)
[Link] (3 responses)
I have a directory which is read just fine with python 2.7, but skips files with python 3.
Posted Apr 24, 2015 11:44 UTC (Fri)
by cesarb (subscriber, #6266)
[Link] (2 responses)
I just took a quick look at the current Python source code for os.listdir (https://hg.python.org/cpython/file/151cab576cab/Modules/p...), and it only has code to skip the "." and ".." entries, as it's documented to do. In both the "str" and the "bytes" case, it adds every entry other than these two. For it to skip anything else on os.listdir, readdir() from glibc has to be skipping it, and it should affect more than just Python.
Or is the problem with something other than os.listdir?
Posted Apr 24, 2015 14:18 UTC (Fri)
by lopgok (guest, #43164)
[Link] (1 responses)
I do find it odd that the OS can list the file and I can manipulate the file name on the command line, but because it has some odd characters in it, python silently skips over it.
Posted May 9, 2015 21:34 UTC (Sat)
by nix (subscriber, #2304)
[Link]
bytes vs. characters
bytes vs. characters
bytes vs. characters
bytes vs. characters