User: Password:
|
|
Subscribe / Log in / New account

Null-Terminated Strings

Null-Terminated Strings

Posted Nov 17, 2010 8:26 UTC (Wed) by nix (subscriber, #2304)
In reply to: Null-Terminated Strings by neilbrown
Parent article: Ghosts of Unix past, part 3: Unfixable designs

nul terminated strings have a huge problem: you can accidentally overwrite the nul, and then you're dead. But if you're doing that, you can accidentally overwrite bits of the inside of the string as well, and then you have wrong results! Is that better? Probably not. Oh, and overwriting off the start or end can break your memory allocator or stack frame anyway, so you'd be dead in any case, even if not using nul-terminated strings.

And then we have Pascal-layout strings (as opposed to actual Pascal 'strings', a nightmare for other reasons, see Kernighan). They don't fix this problem (you just have to overwrite the start of the string, not its end) and have two much bigger problems: finite string length, and an increase in size of every string. The finite string length means that writing general string-handling algorithms without special cases for the rare event of large strings is impossible, and the increase in size of every string bloats small strings, which are by far the common case. You can patch both of these: the first, by making the finite string length as large as a pointer; and the second, by noting that alignment constraints in existing systems bloat the effective size of strings anyway. But of course this soon turns into a special case of nul-terminated strings: point the pointer at the end of the string, bingo, one rather hard-to-consult nul by any other name.

The biggest downside is probably a long-term ABI problem. The scheme is inflexible. If your Pascal string-length header is too short, however do you expand it? It's wired into every string-using program out there! At least nul-terminated strings need no expansion.

The real solution to string-handling unfortunately requires a VM of some description which can prevent the program from accidentally overwriting fields in aggregates by writes to any other field or variable. Then you can do reliable Pascal strings, separating the length from the content, or reliable null-terminated strings, with the separate compartment containing a pointer into the string. Unfortunately this is incompatible with low-level all-the-world's-a-giant-arena languages like C without very specialized fine-grained MMU hardware.

(I have, like everyone, written my own dynamic string-handliing library when younger. It starts out simple but it's amazing how soon you have to introduce extra code to track pointers and make freeing them in error cases less verbose, and extra code to track memory leaks... you need that in C anyway of course but the massive increase in dynamic memory use that dynamically allocating most strings brings tends to force them on you sooner than otherwise.)


(Log in to post comments)

Null-Terminated Strings

Posted Nov 17, 2010 12:38 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

"The biggest downside is probably a long-term ABI problem. The scheme is inflexible. If your Pascal string-length header is too short, however do you expand it? It's wired into every string-using program out there! At least nul-terminated strings need no expansion."

Come on.

You'd naturally use 32 bits on 32 bit systems for string length. And by a strange coincidence, that's the maximum amount of contiguous RAM that you can address on 32-bit systems. On 64-bit systems, you'd naturally use 64-bit counter.

Null-Terminated Strings

Posted Nov 18, 2010 16:40 UTC (Thu) by nix (subscriber, #2304) [Link]

Yes, I talked about that as well. Nice to know that reading five paragraphs of text is too much for you.

Null-Terminated Strings

Posted Nov 18, 2010 17:04 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

I read it completely. However, you point about: "The biggest downside is probably a long-term ABI problem. The scheme is inflexible. If your Pascal string-length header is too short, however do you expand it? It's wired into every string-using program out there! At least nul-terminated strings need no expansion" is not correct.

Null-Terminated Strings

Posted Nov 25, 2010 13:10 UTC (Thu) by renox (subscriber, #23785) [Link]

I agree with him that Pascal's strings are inflexible: think about two computer communicating together one with a 32-bit CPU, one with a 64-bit CPU, if you use a word as a length, you have an issue with Pascal's strings,
but C-strings don't care..

Null-Terminated Strings

Posted Nov 25, 2010 16:22 UTC (Thu) by vonbrand (guest, #4458) [Link]

No, you haven't... (Original) Pascal "strings" were just (packed) arrays of characters of a fixed length.

/me ducks and runs for cover

Null-Terminated Strings

Posted Nov 19, 2010 11:24 UTC (Fri) by job (guest, #670) [Link]

A modern string handling library would have to handle different character sets and different encondings as well, so there's already metadata to be stored with every string.

If memory efficiency is a problem for you, multibyte encodings is a much worse problem than storing string length. But UTF-8/16 is here to stay, there is simply no competition. I think we have to accept it.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds