C, Fortran, and single-character strings
C, Fortran, and single-character strings
Posted Jun 21, 2019 6:24 UTC (Fri) by bokr (guest, #58369)Parent article: C, Fortran, and single-character strings
single characters :) (of some character type)?
Posted Jun 21, 2019 17:29 UTC (Fri)
by valarauca (guest, #109490)
[Link] (2 responses)
But what is a "character"?
UTF-8 doesn't have "characters" anymore. It has "glyphs" which is a "displayable symbol", but some "glyphs" require multiple "unicode scalar values" (what is normally a `unit32_t`). So even a "character" isn't just 1 value anymore.
Sure we can pretend the rest of the world doesn't exist and ASCII is the only text standard, but that seems extremely short sited to burn into an ABI.
Posted Jun 21, 2019 21:17 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link]
However older Fortran programs didn't have prototypes, so the parent comment's suggestion is not applicable.
Posted Jun 26, 2019 9:05 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link]
UTF-8 consists of 8-bit _code units_, strings of which (and similarly for UCS-2, UTF-7, UTF-16 and UCS-4) can be decoded to get Unicode _code points_, which are just integers from an enormous enumeration with names, like "Latin Capital A" and a shared understanding of what they mean.
"Glyphs" are the pretty pictures in a typeface, Unicode isn't directly concerned with how typefaces work, it is ambivalent about whether you choose to have lots of pretty pictures and do some extra work pick from those to draw text, or very few pretty pictures and do different extra work assembling those to draw text. In particular Unicode doesn't care about allographs at all by default, (but for reasons to do with its mission to replace all previous text encodings in fact it encodes a LOT of allographs) that's seen as purely a typeface problem.
You should definitely treat the word "character" as code smell, whatever is going on there will usually be at least confusing and an opportunity for bugs, if not itself directly a bug. That's sad for a lot of older programming languages which use the word "character" all the time. Too bad. See also "number" when used to actually mean something far more limited, like an integer, or a float, or a real.
Posted Jun 21, 2019 18:01 UTC (Fri)
by excors (subscriber, #95769)
[Link] (1 responses)
E.g. http://www.netlib.org/lapack/explore-html/d7/d03/dpptrs_8... does "CALL dtpsv( 'Upper', 'Transpose', 'Non-unit', ...)", where the documentation and definition of dtpsv show it just compares the first argument to 'U'/'L', the second to 'N'/'T'/'C', the third to 'U'/'N', using "lsame" (case-insensitive comparison of the first character).
(This seems a really stupid way to design an API, even without the cross-language issue, because the compiler can't type-check the strings to detect typos. I guess Fortran didn't/doesn't have anything equivalent to C enums?)
Posted Jun 21, 2019 19:02 UTC (Fri)
by joib (subscriber, #8541)
[Link]
The LAPACK interface is Fortran 77, which didn't have derived types (structs in C) or enums like modern Fortran has, so it certainly is a lot more limited than what you'd be able to do today.
Posted Jun 21, 2019 21:20 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link]
Also it would be an ABI break.
C, Fortran, and single-character strings
single characters :) (of some character type)?
C, Fortran, and single-character strings
C, Fortran, and single-character strings
C, Fortran, and single-character strings
C, Fortran, and single-character strings
C, Fortran, and single-character strings