bad taste

Posted Mar 5, 2007 13:35 UTC (Mon) by nix (subscriber, #2304)
In reply to: bad taste by ldo
Parent article: Quote of the week

Also, length-prepended strings. :)

bad taste

Posted Mar 5, 2007 21:14 UTC (Mon) by k8to (guest, #15413) [Link] (3 responses)

Is there a real problem with length-indicated strings, or just using them in environments that are built around something else?

Certainly using them where they don't belong is awful, but I don't see a problem with them fundamentally.

bad taste

Posted Mar 6, 2007 0:50 UTC (Tue) by ldo (guest, #40946) [Link]

Back in my Mac programming days (pre-OS-X), the APIs were full of "Pascal"-format strings, which started with a length byte. A maximum length of 255 may not sound like much, but I estimated that over 90% of the string objects in my programs fitted quite comfortably into this limit.

One thing, I was always careful to pass maximum buffer lengths. To reduce the chance of mistakes, I used macros like this:

#define Descr(v) (void *)&v, sizeof v

which I would use like this:

CopyString(Src, Descr(Dst));

That way, if the destination buffer was too small for the string, the worst that would happen was that it was truncated--you would never get overwriting of random memory.

bad taste

Posted Mar 6, 2007 6:59 UTC (Tue) by njs (subscriber, #40338) [Link] (1 responses)

Dunno what nix was thinking either way, but I'd draw a real sharp distinction between "length prepended" and "length indicated". string buffer plus length sitting next to it (or wrapped up in a structure) = The Right Thing, string buffer with initial byte overloaded to indicate length = eewwwwwwww.

bad taste

Posted Mar 6, 2007 22:52 UTC (Tue) by nix (subscriber, #2304) [Link]

Quite.

Plus, the length byte was never long enough, and the overhead of keeping
the length up to date dominated surprisingly often, even dead-reckoning
them. At least with null-terminated strings you don't need to work out the
length unless you need it.

String ADTs make more sense :) internally they can do whatever they like,
possibly a varying representation depending on the length.

I did this and more with an adaptive string ADT I wrote a few years ago
for an application my then employer wanted. If you kept asking for the
length of a string it started tracking the length itself; if you kept
inserting and deleting from it, and it was long enough, it switched the
representation to a buffer-gap; if you kept on asking for subsets of the
string and it was long enough to blow the dcache (I randomly picked 64Kb),
it turned itself into a position-keyed binary tree, and if the string was
long enough and rarely-read enough it started zipping the longer and
more-rarely-referenced hunks up with zlib.

There would doubtless be more tricks I could have used, but that was all I
needed to get performance up for that application. It wasn't dealing with
strings longer than 200Mb, after all. ;)

One of these days I should rewrite it (I'd call it `clean it up' but it's
too dirty for that, it needs a rewrite, not least because I want to hold
the copyright this time) and release it, only there's probably no point as
someone else has doubtless written something much better.

(Judy trees, which I didn't discover till much later, of course knock the
socks off this, but their API uses such awful names for basically all its
functions that I've not yet been able to bring myself to use them. I
wonder if they'd accept a patch adding names that it's actually possible
to remember? The Great Lowercase Letter and Vowel Shortage ended *years*
ago...)