|
|
Subscribe / Log in / New account

C, Fortran, and single-character strings

C, Fortran, and single-character strings

Posted Jun 21, 2019 17:29 UTC (Fri) by valarauca (guest, #109490)
In reply to: C, Fortran, and single-character strings by bokr
Parent article: C, Fortran, and single-character strings

> Could some workaround be based on passing single-character strings as
single characters :) (of some character type)?

But what is a "character"?

UTF-8 doesn't have "characters" anymore. It has "glyphs" which is a "displayable symbol", but some "glyphs" require multiple "unicode scalar values" (what is normally a `unit32_t`). So even a "character" isn't just 1 value anymore.

Sure we can pretend the rest of the world doesn't exist and ASCII is the only text standard, but that seems extremely short sited to burn into an ABI.


to post comments

C, Fortran, and single-character strings

Posted Jun 21, 2019 21:17 UTC (Fri) by pbonzini (subscriber, #60935) [Link]

That's not a problem, Fortran character means "byte".

However older Fortran programs didn't have prototypes, so the parent comment's suggestion is not applicable.

C, Fortran, and single-character strings

Posted Jun 26, 2019 9:05 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

No.

UTF-8 consists of 8-bit _code units_, strings of which (and similarly for UCS-2, UTF-7, UTF-16 and UCS-4) can be decoded to get Unicode _code points_, which are just integers from an enormous enumeration with names, like "Latin Capital A" and a shared understanding of what they mean.

"Glyphs" are the pretty pictures in a typeface, Unicode isn't directly concerned with how typefaces work, it is ambivalent about whether you choose to have lots of pretty pictures and do some extra work pick from those to draw text, or very few pretty pictures and do different extra work assembling those to draw text. In particular Unicode doesn't care about allographs at all by default, (but for reasons to do with its mission to replace all previous text encodings in fact it encodes a LOT of allographs) that's seen as purely a typeface problem.

You should definitely treat the word "character" as code smell, whatever is going on there will usually be at least confusing and an opportunity for bugs, if not itself directly a bug. That's sad for a lot of older programming languages which use the word "character" all the time. Too bad. See also "number" when used to actually mean something far more limited, like an integer, or a float, or a real.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds