|
|
Subscribe / Log in / New account

C, Fortran, and single-character strings

C, Fortran, and single-character strings

Posted Jun 22, 2019 5:05 UTC (Sat) by marcH (subscriber, #57642)
Parent article: C, Fortran, and single-character strings

> The C language famously does not worry much about the length of strings, which simply extend until the null byte at the end. Fortran, though, likes to know the sizes of the strings it is dealing with.

Not just Fortran but any remotely sane/safe/modern language including C++. Even newer and safer C APIs.

Some other comment mentioned "cargo cult programming techniques": null-terminated strings is probably one of the top examples of that. Any other language doing it?


to post comments

C, Fortran, and single-character strings

Posted Jun 22, 2019 17:45 UTC (Sat) by ncm (guest, #165) [Link] (7 responses)

Every language does that has to interact with C does, which today is all of them.

But it's not the only dodgy practice around strings, and they are accumulating at an impressive rate. A lot of Pascal family languages store/stored the length in the first byte, with no great answer to how to do a longer string. Others, for first-two-bytes. Lots of languages switched to two-byte first generation Unicode, but have no concept of normalizing different representations with modifier code points, so e.g. strings that produce the same set of glyphs compare unequal, and there is no concept of a character representable only as a pair of two inseparable code units.

Unicode has characters that have no visible glyph and take no space, so could be sprinkled anywhere, and lots of code points have glyphs necessarily identical to others, that normalization isn't allowed to choose just one of. Lots of languages have adopted UTF-8, but not tackled any of the similar problems.

Getting exercised over the choice of representing length with a null terminator will leave you entirely unequipped for the much bigger problems that matter.

C, Fortran, and single-character strings

Posted Jun 22, 2019 18:12 UTC (Sat) by marcH (subscriber, #57642) [Link] (6 responses)

> Lots of languages switched to two-byte first generation Unicode,

I was referring to *memory* length from a safety and performance perspective.

> Every language does that has to interact with C does, which today is all of them.

Yeah, sure. Off-topic too.

C, Fortran, and single-character strings

Posted Jun 22, 2019 21:04 UTC (Sat) by ncm (guest, #165) [Link] (5 responses)

If C strings are evidence if cargo-cultish programming, so are all other string implementations, without exception. Nobody gets a pass, or a diploma.

Null termination is an example of a venerable programming practice, the use of sentinel elements, lately fallen from favor now that memory and cycles are thousands, millions, or even billions of times cheaper than they once were.

If we sneer at choices made then, under the constraints of the time, how much more derision do we deserve for unfortunate choices made without such constraints? 'Cause I could list such, all day long, about any system, language, or technology you can think of.

C, Fortran, and single-character strings

Posted Jun 22, 2019 21:37 UTC (Sat) by marcH (subscriber, #57642) [Link] (4 responses)

Yes, some of C' unsafe choices were made in a completely different context and yes of course many of these choices were required to perform anything in a reasonable time on 50 years older hardware (and not under constant attack). For this particular question however - if we can stop digressing for a minute - I can't see any massive performance advantage for having a marker at the end of an array compared to storing its length somewhere near the start. Maybe for some operations but clearly not for others. By the way Fortran is older than C and is still in use today too because of its... performance. LISP is even older; it's now renown for its performance yet it was somehow getting some work done at the time too.

> 'Cause I could list such, all day long, about any system, language, or technology you can think of.

Sure, let's start by looking at some CVE statistics. Wait, I said no digression sorry.

C, Fortran, and single-character strings

Posted Jun 23, 2019 4:04 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

C-style strings are really the only choice for a language like C. Since your primitive is a (regular) pointer you can't pass length naturally.

C strings allow you to pass substrings as a pair of pointers (or just one pointer for tail substrings), for example.

C, Fortran, and single-character strings

Posted Jun 23, 2019 18:02 UTC (Sun) by marcH (subscriber, #57642) [Link] (2 responses)

> Since your primitive is a (regular) pointer you can't pass length naturally.

Yes the type of (safer) arrays would have been one step above "primitive".

Looking at string.h on opengroup.org, it's interesting to see almost half the functions there already have some size_t argument.

> C strings allow you to pass substrings as a pair of pointers (or just one pointer for tail substrings), for example.

This is indeed a performance optimization. It's also a dangerous one if the array is not const (who owns it now?) and I don't see how "higher level" arrays would stop you from still doing that, I would just discourage you from doing it routinely in non-critical paths.

C, Fortran, and single-character strings

Posted Jun 23, 2019 21:07 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Yes the type of (safer) arrays would have been one step above "primitive".
Sure, but C was designed without such arrays. And a language with safe arrays won't be C.

I'm not saying that it's a good idea now, but null-terminated strings certainly make sense in C.

C, Fortran, and single-character strings

Posted Jun 25, 2019 16:41 UTC (Tue) by rgmoore (✭ supporter ✭, #75) [Link]

Sure, but C was designed without such arrays. And a language with safe arrays won't be C.

No. A language with only safe arrays won't be C. C is supposed to provide access to low-level functions and that includes unsafe pointers and arrays. But C is also supposed to allow programmers to build higher-level abstractions, including things like safe arrays and strings, and there's excellent reason to use those safe arrays and strings in place of the unsafe alternatives when performance is not critical.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds