LWN.net Logo

Advertisement

E-Commerce & credit card processing - the Open Source way!

Advertise here

Changes ahead for Python

Posted Sep 13, 2007 14:20 UTC (Thu) by walters (subscriber, #7396)
Parent article: Changes ahead for Python

"The Py3k solution is to separate strings, which contain decoded text, and byte-strings which are binary data into two distinct types, str and bytes. "

This is wrong as far as I can see; quoting from the page:

"There is only one string type; its name is str but its behavior and implementation are more like unicode in 2.x.
PEP 358: There is a new type, bytes, to represent binary data"

Which makes sense, because what the heck would a byte-string be? There are strings which contain Unicode, and byte *arrays* (calling them strings just leads to confusion).


(Log in to post comments)

Changes ahead for Python

Posted Sep 13, 2007 21:32 UTC (Thu) by xorbe (guest, #3165) [Link]

right, the byte arrays are not strings.

Changes ahead for Python

Posted Sep 14, 2007 18:39 UTC (Fri) by jordanb (subscriber, #45668) [Link]

If you think of it as "Character String" vs. "Byte String" I don't think there's an issue. A character string is a string of characters, which is to say codes that form valid characters in some predetermined encoding, whereas a 'byte string' is a string of completly arbitrary bytes.

Changes ahead for Python

Posted Sep 20, 2007 17:19 UTC (Thu) by larryr (guest, #4030) [Link]

The Py3k solution is to separate strings [...] and byte-strings [...] str and bytes
This is wrong as far as I can see; quoting from the page:
There is only one string type; its name is str but its behavior and implementation are more like unicode in 2.x. PEP 358: There is a new type, bytes, to represent binary data
Which makes sense, because what the heck would a byte-string be? There are strings which contain Unicode, and byte *arrays* (calling them strings just leads to confusion).

To me, the array in this case comprises a series of 8-bit characters, which to me is a string, and to me a "byte string" is a reasonable and useful thing to call it; it is a string whose (character) elements each have a size of one byte.

Larry

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.