User: Password:
|
|
Subscribe / Log in / New account

Unicode and Bytes

Unicode and Bytes

Posted Nov 16, 2009 13:50 UTC (Mon) by kleptog (subscriber, #1183)
In reply to: Unicode and Bytes by pboddie
Parent article: Python moratorium and the future of 2.x

I find it interesting that perl 5.6 introduced Unicode everywhere internally and no-one had to rewrite a thing. Internally everything is unicode but you generally don't even notice. Except all those people who *needed* unicode support could write their programs and all the extension modules just worked (for the most part).

For example, chr() can now return numbers greater than 255, but that doesn't bother people so much. Your *source* will still be interpreted as latin1 but that doesn't bother many people since you don't usually need unicode in your source (I'm somewhat baffled by pythons "unicode" construct at source level, assuming the source was latin1 would have made transitions easier).

What mostly happened is that when you tried to send unicode data over a pipe or to a file, you got an error. People filed bugs, the appropriate encode() call was added (or "use bytes" if people wanted to punt) and all was well. The workaround was to encode prior to calling the module so it was no big deal.

There is magic under the hood ofcourse, see the perlunicode manpage, but the result is a completely transparent transition and is why at my work we run perl5.8 and python2.4, because python upgrades always break something (for zero apparent benefit).


(Log in to post comments)

Unicode and Bytes

Posted Nov 19, 2009 12:19 UTC (Thu) by yeti-dn (guest, #46560) [Link]

I find it interesting that perl 5.6 introduced Unicode everywhere internally and no-one had to rewrite a thing.

Maybe in the US because US-ASCII is good for everyone there (with a few reluctantly admitting that ISO-8859-1 exists too). But I remember perl 5.6 release well exactly because it broke important programs (for me anyway) working with international text and reencoding it. Somehow, perl 5.8 managed to break them again. Then I stopped counting.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds