Iterators vs indices
Iterators vs indices
Posted Mar 30, 2010 6:57 UTC (Tue) by njs (subscriber, #40338)In reply to: Iterators vs indices by butlerm
Parent article: Resetting PHP 6
For some reason nobody does this, though.
Posted Mar 30, 2010 7:11 UTC (Tue)
by dlang (guest, #313)
[Link] (3 responses)
besides, as noted earlier in this thread, most uses of strings really don't care how they break apart, they are almost always used as-is (or at most with one step of parsing, usually on whitespace, on input) as such, anything more than the most compact representation ends up costing significantly more in memory size (and therefor cache space) than you gain with any string manipulation that you do
Google Wave actually stores strings the way you are suggesting, or did when I saw the presentation on it last year, but I think that doing so will keep it from being used for anything beyond trivial uses.
Posted Mar 30, 2010 7:46 UTC (Tue)
by njs (subscriber, #40338)
[Link] (2 responses)
The memory overhead is certainly not as high as UCS-32 (at least for strings where UTF-8 has lower overhead than UCS-32 to start with) -- you need something like 3*log_2(n) words of overhead, but n is the number of "chunks", not bytes, and a reasonable chunk-size is in the hundreds of bytes, at least. Within a chunk you revert to linear behavior, but that's not so bad, IIUC on modern CPUs linear-time is not much worse than constant-time when it comes to accessing short arrays.
Most strings are short, and with proper tuning they'd probably fit into one chunk anyway, so the overhead is nearly nil.
But you're right, there is some overhead -- not that this stops people from using scripting languages -- and a lot of tricky implementation, and simple solutions are often good enough.
I don't understand what you mean about Google Wave, though. A) Isn't it mostly a protocol? Where do string storage APIs come in? B) It's exactly the non-trivial uses -- where you have large, mutable strings -- that arrays and linear-time iteration don't scale to.
Posted Mar 31, 2010 2:05 UTC (Wed)
by dlang (guest, #313)
[Link] (1 responses)
google wave uses the jabber protocol, but in it's documents it doesn't store words, it stores the letters individually, grouped togeather so that they can be changed individually (or so it was explained by the google rep giving the presentation I was at)
Posted Mar 31, 2010 4:30 UTC (Wed)
by njs (subscriber, #40338)
[Link]
I'm afraid I don't understand at all. I *am* that poster, and the data structure I described can do O(log n) inserts without pointers to individual characters. Perhaps I am just explaining badly?
Posted Mar 30, 2010 8:12 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Mar 30, 2010 17:06 UTC (Tue)
by njs (subscriber, #40338)
[Link]
The most important difference is that ropes are happy -- indeed, delighted -- to store very long strings inside a single tree node when they have the chance, because their goal is just to amortize mutation operations, not to provide efficient access by semi-arbitrary index rules.
Iterators vs indices
Iterators vs indices
Iterators vs indices
Iterators vs indices
Iterators vs indices
Iterators vs indices