Resetting PHP 6
Resetting PHP 6
Posted Mar 24, 2010 18:35 UTC (Wed) by ikm (guest, #493)Parent article: Resetting PHP 6
This would look like the right approach to me.
Posted Mar 24, 2010 19:26 UTC (Wed)
by mrshiny (guest, #4266)
[Link]
Posted Mar 26, 2010 4:02 UTC (Fri)
by spitzak (guest, #4593)
[Link] (1 responses)
First of all, the primary thing that happens in real programs is that the halves of the string get pasted back together, such as when fixed-sized blocks are copied from one file to another. That does not destroy UTF-8 at all.
Second, why is breaking a "character" really such a disaster? Why are we not worried about breaking "words"? If I split a english word in half I will probably get two non-words. How can I possibly safely use a computer language that allows such things? Why it seems hard to believe that word processors could be written when the computer would allow this horrible abilty! /sarcasm
Worrying about "breaking characters" is actually stupid, and is being used as an excuse to defend the bone-headed decision to use "wide characters".
Posted Mar 26, 2010 9:51 UTC (Fri)
by ikm (guest, #493)
[Link]
No, your example doesn't count -- this isn't string splitting, your resulting strings are intact there. The primary thing that happens in real programs is that they try to shorten the string, e.g. make "A very long string" into something like "A very lo...", to squeeze it in e.g. a fixed space of 12 characters, or do similar transformations. Those transformations can't be done correctly on raw 8-bit utf-8 strings.
> why is breaking a "character" really such a disaster? Why are we not worried about breaking "words"?
Because you're breaking the underlying encoding of the characters, not the characters itself. The resulting bitstream would be an invalid utf-8 sequence. Parts of english words you split would be rendered intact just fine, but damaged and invalid utf-8 would either result in no display at all, or in program/library barf. You can safely combine valid utf-8 sequences together, but you can't arbitrarily cut them and expect the result to be valid.
> Worrying about "breaking characters" is actually stupid, and is being used as an excuse to defend the bone-headed decision to use "wide characters".
As a Russian, I actually know how important this is. I've seen enough non-utf8 aware programs and observed enough of their horrendous problems to understand the importance of wide characters. What makes you so bold in your statements? You seem to know nothing about the topic.
Resetting PHP 6
Resetting PHP 6
Resetting PHP 6