|
|
Subscribe / Log in / New account

Resetting PHP 6

Resetting PHP 6

Posted Mar 26, 2010 9:51 UTC (Fri) by ikm (guest, #493)
In reply to: Resetting PHP 6 by spitzak
Parent article: Resetting PHP 6

> First of all, the primary thing that happens in real programs is that the halves of the string get pasted back together

No, your example doesn't count -- this isn't string splitting, your resulting strings are intact there. The primary thing that happens in real programs is that they try to shorten the string, e.g. make "A very long string" into something like "A very lo...", to squeeze it in e.g. a fixed space of 12 characters, or do similar transformations. Those transformations can't be done correctly on raw 8-bit utf-8 strings.

> why is breaking a "character" really such a disaster? Why are we not worried about breaking "words"?

Because you're breaking the underlying encoding of the characters, not the characters itself. The resulting bitstream would be an invalid utf-8 sequence. Parts of english words you split would be rendered intact just fine, but damaged and invalid utf-8 would either result in no display at all, or in program/library barf. You can safely combine valid utf-8 sequences together, but you can't arbitrarily cut them and expect the result to be valid.

> Worrying about "breaking characters" is actually stupid, and is being used as an excuse to defend the bone-headed decision to use "wide characters".

As a Russian, I actually know how important this is. I've seen enough non-utf8 aware programs and observed enough of their horrendous problems to understand the importance of wide characters. What makes you so bold in your statements? You seem to know nothing about the topic.


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds