User: Password:
|
|
Subscribe / Log in / New account

Python user here

Python user here

Posted Nov 14, 2009 0:08 UTC (Sat) by anselm (subscriber, #2796)
In reply to: Python user here by man_ls
Parent article: Python moratorium and the future of 2.x

The fun thing is that Tcl/Tk got Unicode essentially nailed more than 10 years ago (in version 8.2, to be exact), and transparently at that. None of those flag-day incompatibilities that seem to haunt the Python world. In Tcl there is no notion of separate »byte strings« and »Unicode strings«. Things just work. Most of the other languages that have been around for about as long (e.g., Perl or Python) are still struggling.

People may not like the Tcl language a lot (it is a bit of an acquired taste, to be sure) but much of the underlying engineering is really very good indeed.


(Log in to post comments)

Unicode and Bytes

Posted Nov 15, 2009 0:31 UTC (Sun) by pboddie (guest, #50784) [Link]

In Tcl there is no notion of separate »byte strings« and »Unicode strings«. Things just work.

Maybe they do. It's probably no accident that custody of Tcl was with Sun around that time (or slightly earlier) and that Java APIs also occasionally have byte sequences becoming proper strings with a sprinkling of magic, although I forget where this was - probably the Servlet API, where you mostly do want strings, but where deficiencies in the standards require a bit of guesswork to actually provide correct strings (and not just bytes) to the API user when they ask for a request parameter or part of a URL.

Of course, there's nothing to stop you storing byte values in a Unicode string type: Jython managed to do this, too. But again, cross your fingers that the right magic is being used.

Unicode and Bytes

Posted Nov 16, 2009 13:50 UTC (Mon) by kleptog (subscriber, #1183) [Link]

I find it interesting that perl 5.6 introduced Unicode everywhere internally and no-one had to rewrite a thing. Internally everything is unicode but you generally don't even notice. Except all those people who *needed* unicode support could write their programs and all the extension modules just worked (for the most part).

For example, chr() can now return numbers greater than 255, but that doesn't bother people so much. Your *source* will still be interpreted as latin1 but that doesn't bother many people since you don't usually need unicode in your source (I'm somewhat baffled by pythons "unicode" construct at source level, assuming the source was latin1 would have made transitions easier).

What mostly happened is that when you tried to send unicode data over a pipe or to a file, you got an error. People filed bugs, the appropriate encode() call was added (or "use bytes" if people wanted to punt) and all was well. The workaround was to encode prior to calling the module so it was no big deal.

There is magic under the hood ofcourse, see the perlunicode manpage, but the result is a completely transparent transition and is why at my work we run perl5.8 and python2.4, because python upgrades always break something (for zero apparent benefit).

Unicode and Bytes

Posted Nov 19, 2009 12:19 UTC (Thu) by yeti-dn (guest, #46560) [Link]

I find it interesting that perl 5.6 introduced Unicode everywhere internally and no-one had to rewrite a thing.

Maybe in the US because US-ASCII is good for everyone there (with a few reluctantly admitting that ISO-8859-1 exists too). But I remember perl 5.6 release well exactly because it broke important programs (for me anyway) working with international text and reencoding it. Somehow, perl 5.8 managed to break them again. Then I stopped counting.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds