About type inference coming to the C language as well [LWN.net]

About type inference coming to the C language as well

Posted Dec 10, 2023 9:00 UTC (Sun) by swilmet (subscriber, #98424) [Link]

(Oops, posted my comment as a sub-comment instead of a new top-level one, I clicked on the wrong reply button…)

About type inference coming to the C language as well

Posted Dec 10, 2023 11:52 UTC (Sun) by excors (subscriber, #95769) [Link] (20 responses)

I think an important point that's missing from your argument is that modern languages have much more sophisticated type systems than C, with features like generics, and modern libraries make use of those type systems, so type names are very commonly much longer (and sometimes impossible) to write. If you don't have type inference, the language will be restricted to much simpler types, and you lose the correctness and performance benefits of having more information statically encoded in types.

Like, using `auto` instead of `const char*` or `ArrayList<String>` isn't a huge benefit, because those are pretty simple types. But when you're regularly writing code like:

for (std::map<std::string, std::string>::iterator it = m.begin(); it != m.end(); ++it) { ... }

then it gets quite annoying, since the type name makes up half the line, and it obscures the high-level intent of the code (which is simply to iterate over `m`). (And that's not the real type anyway; `std::string` is the templated `std::basic_string<char>`, and the `iterator` is a typedef which is documented to be a LegacyBidirectionalIterator which is a LegacyForwardIterator which is a LegacyIterator which specifies the `++it` operation etc, so in practice you're not going to figure out how the type behaves from the documentation - you're really going to need a type-aware text editor or IDE, at least until you've memorised enough of the typical library usage patterns. That's just an obligatory part of modern programming.)

Or in Rust you might rely on type inference like:

let v = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
let vals: Vec<i32> = v.collect();

where you can see the important information (that it ends up with a vector of ints), and you can assume `v` is some sort of iterable thing but you don't care exactly what. Writing it explicitly would be something terrible like:

let v: std::iter::Map<std::str::SplitAsciiWhitespace<'_>, impl Fn(&str) -> i32> = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
let vals: Vec<i32> = v.collect();

except that won't actually work because the `'_` is referring to a lifetime which I don't think there is any way to express in code; and the closure is actually an anonymous type (constructed by the compiler to contain any captured variables) which implements the `Fn` trait, and you can only use the `impl Trait` syntax in argument types (where it's a form of generics) and return types (where it's a kind of information hiding), not in variable bindings, so there's no way to name the closure type. Rust's statically-checked lifetimes and non-heap-allocated closures are useful features that simply can't work without type inference.

About type inference coming to the C language as well

Posted Dec 10, 2023 21:52 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (5 responses)

Yeah, I'm a caveman, often working in Vim without any special development features, yet I am not bothered at all to see e.g. let chars = foo.bar().into_iter();

Sure, I have no idea what "type" chars actually is, but it's clearly some sort of Iterator, and somebody named it chars, I feel entitled to assume it impl Iterator<Item = char> unless it's obvious in context that it doesn't.

If anything I think I more often resent needing to spell out types for e.g. constants where I'm obliged to specify that const MAX_EXPIRY: MyDayType = 398; rather than let the compiler figure out that's the only correct type. I don't hate that enough to think it should be changed, it makes sense, but I definitely run into it more often than I regret not knowing the type of chars in a construction like let chars = foo.bar().into_iter()

However, of course C has lots of footguns which I can imagine would be worsened with inference, so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

About type inference coming to the C language as well

Posted Dec 10, 2023 22:07 UTC (Sun) by mb (subscriber, #50428) [Link]

>so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

Yes, that is true.

Type inference works well in Rust due to its strict type system.
But a subset of Rust's type inference will probably work well in C.

About type inference coming to the C language as well

Posted Dec 10, 2023 23:00 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (3 responses)

> However, of course C has lots of footguns which I can imagine would be worsened with inference, so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

I would agree with this. The main concern I can think of is how C handles numeric conversions. They are messy, complicated, and I always have to look them up.[1] They can mostly be summarized as "promote everything to the narrowest type that can represent all values of both argument types, and if an integer, is at least as wide as int," but that summary is wrong (float usually *can't* represent all values of int, but C will just promote int to float anyway). Throwing type inference on top of that mess is probably just going to make things worse.

By contrast, Rust has no such logic. If you add i32 + i16, or any other situation where the types do not match, you just get a flat compiler error.

I do wish Rust would let me write this:

let x: i32 = 1;
let y: i16 = 2;
let z: i32 = x + y.into(); // Compiler error!

(Presumably this is because you can also add i32 + &i32, and the compiler isn't quite smart enough to rule out that override.)

The compiler suggests writing this abomination, which does work:

let z: i32 = x + <i16 as Into<i32>>::into(y);

But at least you can write this:

let x: i32 = 1;
let y: i16 = 2;
let y32: i32 = y.into();
let z: i32 = x + y32;

[1]: https://en.cppreference.com/w/c/language/conversion

About type inference coming to the C language as well

Posted Dec 11, 2023 3:08 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (2 responses)

And, after posting this comment, I've realized that the reason into() doesn't work is because you're *supposed* to write this instead:

let z: i32 = x + i32::from(y);

Obviously I need to spend more time studying Rust, or maybe actually sit down and write a toy program in it.

Finally, I should note that you can write "y as i32", but that's less safe because it will silently do a narrowing conversion. from() and into() can only do conversions that never lose data, and there's also try_from()/try_into() if you want to handle overflow explicitly.

About type inference coming to the C language as well

Posted Dec 11, 2023 13:08 UTC (Mon) by gspr (guest, #91542) [Link] (1 responses)

> and there's also try_from()/try_into() if you want to handle overflow explicitly.

And there's try_from().expect("Conversion failure") for those cases where you wanna say "man, I don't really wanna think about this, and I'm sure the one type converts to the other without loss in all cases my program experiences – but if I did overlook something, then at least abort with an error message instead of introducing silent errors".

About type inference coming to the C language as well

Posted May 8, 2024 15:41 UTC (Wed) by adobriyan (subscriber, #30858) [Link]

Rust could allow "u32 = u32 + u8" with it's usual overflow checks, but not "u32 + i8".

The "messy numeric conversions" are largely due to rubber types and the fact that there are lots of them
(5 main, __uint128, size_t, uintptr_t, intmax_t, ptrdiff_t). POSIX doesn't help with off_t.

If all you have is what Rust has, C is not _that_ bad.

Kernel has certain number of min(x, 1UL) expression just because x is "unsigned long", but it is clear that programmer wants typeof(x).

About type inference coming to the C language as well

Posted Dec 11, 2023 4:43 UTC (Mon) by swilmet (subscriber, #98424) [Link] (13 responses)

It's true that in C++ and Rust, types can be quite long to write.

Both C++ and Rust have a large core language, while C has a small core language.

I see Rust more as a successor to C++. C programmers in general - I think - like the fact that C has a small core language. So in C the types remain small to write, and there are more function calls instead of using sophisticated core language features. C is thus more verbose, and verbosity can be seen as an advantage.

Maybe the solution is to create a SubC language: a subset of C that is safe (or at least safer). That's already partly the case with the compiler options, hardening efforts etc.

About type inference coming to the C language as well

Posted Dec 11, 2023 8:39 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (12 responses)

> Maybe the solution is to create a SubC language: a subset of C that is safe (or at least safer). That's already partly the case with the compiler options, hardening efforts etc.

I disagree with this, assuming that "safe" means "cannot cause UB outside of an unsafe block." A safe version of C needs at least the following:

* Lifetimes and borrow checking, which implies a type annotation similar to generics.
* Type inference, or else you have to write lifetime annotations everywhere.
* Box<T> or something equivalent to Box<T>, or else you can't put big objects on the heap and move their ownership around.
* Arc<RwLock<T>> or some equivalent, or else you have no reasonable escape hatch from the borrow checker (other than unsafe blocks).
* Rc<RefCell<T>> or some equivalent, or else you have to use the multithreaded escape hatch even in single-threaded code.
* And then there are many other optimizations such as using Mutex<T> instead of RwLock<T>, or OnceCell<T> instead of RefCell<T>. All of these have valid equivalents in C, and should be possible to represent in our hypothesized "safe C" (without needing more than a minimal amount of unsafe, preferably buried somewhere in the stdlib so that "regular" code can be safe).

I just don't see how you provide all of that flexibility without doing monomorphization, at which point you're already 80% of the way to reinventing Rust.

About type inference coming to the C language as well

Posted Dec 11, 2023 11:10 UTC (Mon) by Sesse (subscriber, #53779) [Link] (3 responses)

I guess that if you banned threads and pointers (presumably requiring lots of globals) and made all array access bounds-checked and all data zero-initialized, you could get a safe C subset without going there. How useful it would be would be a different thing...

About type inference coming to the C language as well

Posted Dec 11, 2023 13:52 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

If you're not careful, you end up with something like Wuffs. A perfectly useful language in some domains, but deliberately limited in scope to stop you writing many classes of bug.

About type inference coming to the C language as well

Posted Dec 14, 2023 10:55 UTC (Thu) by swilmet (subscriber, #98424) [Link] (1 responses)

Seems useful to write command-line programs, for example.

About type inference coming to the C language as well

Posted Dec 14, 2023 10:57 UTC (Thu) by farnz (subscriber, #17727) [Link]

You're not going to get very far when you can't access arguments, or do I/O. Wuffs is deliberately limited to not doing that, because it's dangerous to mix I/O with file format parsing.

About type inference coming to the C language as well

Posted Dec 11, 2023 11:35 UTC (Mon) by swilmet (subscriber, #98424) [Link] (7 responses)

I'm not an expert in programming languages design and security-related matters.

But why not trying a C-to-Rust transpiler? (random idea).

By keeping a small core language with the C syntax, and having a new standard library that looks like Rust but uses more function calls instead.

The transpiler would "take" the new stdlib as part of the language, for performance reasons, and translates the function calls to Rust idioms.

A source-to-source compiler is of course not ideal, but that's how C++ was created ("C with classes" was initially translated to C code).

About type inference coming to the C language as well

Posted Dec 11, 2023 12:09 UTC (Mon) by farnz (subscriber, #17727) [Link] (6 responses)

You might want to look at the C2Rust project; the issue is that a clean transpiler to Rust has to use unsafe liberally, since C constructs translate to something that can't be represented in purely Safe Rust.

The challenge then becomes adding something like lifetimes (so that you can translate pointers to Rust references instead of Rust raw pointers) without "bloating" C. I suspect that it's impossible to have a tiny core language without pushing many problems into the domain of "the programmer simply must not make any relevant mistakes"; note, though, that this is not bi-directional, since a language with a big core can still push many problems into that domain.

About type inference coming to the C language as well

Posted Dec 12, 2023 10:32 UTC (Tue) by swilmet (subscriber, #98424) [Link] (5 responses)

I didn't know C2Rust, it shows that my random idea is not stupid after all :)

But I had the idea to convert (a subset of) C to _safe_ Rust, of course. Instead of some Rust keywords, operators etc (the core language), have C functions instead.

Actually the GLib/GObject project is looking to have Rust-like way of handling things, see:
https://www.bassi.io/articles/2023/08/23/the-mirror/
(but a bit long to read, and one needs to know the GObject world to understand the blog post I think).

Anyway, that's an interesting topic for researchers. Then making it useful and consumable for real-world C projects is yet another task.

About type inference coming to the C language as well

Posted Dec 12, 2023 10:43 UTC (Tue) by farnz (subscriber, #17727) [Link]

The hard part is not the keywords and operators - it's the lifetime annotation system. Lifetimes are a check on what the programmer intended, so have to be possible to write as an annotation to pointer types in the C derived language, but then to be usable force you to have a generics system (since you want many things to be generic over a lifetime) with (at least) covariance and invariance possible to express.

And once you have a generics system that can express covariance and invariance for each item in a set of generic parameters, why wouldn't you allow that to be used for types as well as lifetimes? At which point, you have Rust traits and structs, and most of the complexity of Rust.

About type inference coming to the C language as well

Posted Dec 12, 2023 11:34 UTC (Tue) by mb (subscriber, #50428) [Link] (3 responses)

>But I had the idea to convert (a subset of) C to _safe_ Rust, of course.

That is not possible, except for very trivial cases.

The C code does neither include enough information (e.g. lifetimes) for that to work, nor is it usually structured in a way for this to work.

Programming in Rust requires a different way of thinking and a different way of structuring your code. An automatic translation of the usual ideomatic C programs will fail so hard that it would be easier to rewrite it from scratch instead of translating it and then fixing the compile failures.

About type inference coming to the C language as well

Posted Dec 13, 2023 23:59 UTC (Wed) by swilmet (subscriber, #98424) [Link] (2 responses)

The C syntax alone is not enough, but comments with annotations can be added, and become part of the language.

I started to learn Rust but dislike the fact that it has many core features ("high-level ergonomics"). It's probably possible to use Rust in a simplistic way though, except maybe if a library forces to use the fancy features.

About type inference coming to the C language as well

Posted Dec 14, 2023 9:37 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

You could avoid using those libraries, and limit yourself to libraries that have a "simple" enough interface for you (no_std libraries are a good thing to look for here, since they're designed with just core and maybe alloc in mind, not the whole of std) - bearing in mind that you don't need to care how those libraries are implemented if it's just about personal preference.

In general, though, I wouldn't be scared of a complex core language - all of that complexity has to be handled somewhere, and a complex core language can mean that complexity is being compiler-checked instead of human-checked.

About type inference coming to the C language as well

Posted Dec 14, 2023 11:07 UTC (Thu) by swilmet (subscriber, #98424) [Link]

The codebases that I maintain already use between two and three/four main programming languages (welcome to GNOME, I should say). At some point I wanted to write new code in Rust, but it means adding more complexity and being less productive for some time while learning the language.

"Soft"ware, they said :-)

About type inference coming to the C language as well

Posted Dec 10, 2023 12:03 UTC (Sun) by Wol (subscriber, #4433) [Link]

> In my opinion, type inference for variable declarations should be used only sparingly, when the type of the variable is already visible (and quite long to write) on the right-hand side of the assignment. Writing the types of variables explicitly enhance code comprehension.

Have a variable type of "infer"? That way, an undeclared variable is still an error, but you can explicitly tell the compiler to decide for itself :-)

Cheers,
Wol