C, Fortran, and single-character strings

Posted Jun 21, 2019 6:24 UTC (Fri) by bokr (guest, #58369)
Parent article: C, Fortran, and single-character strings

Could some workaround be based on passing single-character strings as
single characters :) (of some character type)?

C, Fortran, and single-character strings

Posted Jun 21, 2019 17:29 UTC (Fri) by valarauca (guest, #109490) [Link] (2 responses)

> Could some workaround be based on passing single-character strings as
single characters :) (of some character type)?

But what is a "character"?

UTF-8 doesn't have "characters" anymore. It has "glyphs" which is a "displayable symbol", but some "glyphs" require multiple "unicode scalar values" (what is normally a `unit32_t`). So even a "character" isn't just 1 value anymore.

Sure we can pretend the rest of the world doesn't exist and ASCII is the only text standard, but that seems extremely short sited to burn into an ABI.

C, Fortran, and single-character strings

Posted Jun 21, 2019 21:17 UTC (Fri) by pbonzini (subscriber, #60935) [Link]

That's not a problem, Fortran character means "byte".

However older Fortran programs didn't have prototypes, so the parent comment's suggestion is not applicable.

C, Fortran, and single-character strings

Posted Jun 26, 2019 9:05 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

No.

UTF-8 consists of 8-bit _code units_, strings of which (and similarly for UCS-2, UTF-7, UTF-16 and UCS-4) can be decoded to get Unicode _code points_, which are just integers from an enormous enumeration with names, like "Latin Capital A" and a shared understanding of what they mean.

"Glyphs" are the pretty pictures in a typeface, Unicode isn't directly concerned with how typefaces work, it is ambivalent about whether you choose to have lots of pretty pictures and do some extra work pick from those to draw text, or very few pretty pictures and do different extra work assembling those to draw text. In particular Unicode doesn't care about allographs at all by default, (but for reasons to do with its mission to replace all previous text encodings in fact it encodes a LOT of allographs) that's seen as purely a typeface problem.

You should definitely treat the word "character" as code smell, whatever is going on there will usually be at least confusing and an opportunity for bugs, if not itself directly a bug. That's sad for a lot of older programming languages which use the word "character" all the time. Too bad. See also "number" when used to actually mean something far more limited, like an integer, or a float, or a real.

C, Fortran, and single-character strings

Posted Jun 21, 2019 18:01 UTC (Fri) by excors (subscriber, #95769) [Link] (1 responses)

It looks like the arguments are not really single characters. They're strings, where the function only cares about the first character.

E.g. http://www.netlib.org/lapack/explore-html/d7/d03/dpptrs_8... does "CALL dtpsv( 'Upper', 'Transpose', 'Non-unit', ...)", where the documentation and definition of dtpsv show it just compares the first argument to 'U'/'L', the second to 'N'/'T'/'C', the third to 'U'/'N', using "lsame" (case-insensitive comparison of the first character).

(This seems a really stupid way to design an API, even without the cross-language issue, because the compiler can't type-check the strings to detect typos. I guess Fortran didn't/doesn't have anything equivalent to C enums?)

C, Fortran, and single-character strings

Posted Jun 21, 2019 19:02 UTC (Fri) by joib (subscriber, #8541) [Link]

Fortran doesn't have a separate type for a single character/glyph/grapheme/whatever. There's just the CHARACTER type which is, well, what many other languages call a string.

The LAPACK interface is Fortran 77, which didn't have derived types (structs in C) or enums like modern Fortran has, so it certainly is a lot more limited than what you'd be able to do today.

C, Fortran, and single-character strings

Posted Jun 21, 2019 21:20 UTC (Fri) by pbonzini (subscriber, #60935) [Link]

No, because you don't have prototypes and when passing "a" the receiver could expect either a length-1 string or an unknown-length string. In the latter case the length would be needed, in the former it wouldn't.

Also it would be an ABI break.