About type inference coming to the C language as well
About type inference coming to the C language as well
Posted Dec 10, 2023 8:57 UTC (Sun) by swilmet (subscriber, #98424)In reply to: Modern C for Fedora (and the world) by willy
Parent article: Modern C for Fedora (and the world)
In my opinion, type inference for variable declarations should be used only sparingly, when the type of the variable is already visible (and quite long to write) on the right-hand side of the assignment. Writing the types of variables explicitly enhance code comprehension.
See this article that I wrote this night after reading this LWN article: About type inference
(the article is 2 pages long, a bit too long to copy here as a comment, I suppose).
Posted Dec 10, 2023 9:00 UTC (Sun)
by swilmet (subscriber, #98424)
[Link]
Posted Dec 10, 2023 11:52 UTC (Sun)
by excors (subscriber, #95769)
[Link] (20 responses)
Like, using `auto` instead of `const char*` or `ArrayList<String>` isn't a huge benefit, because those are pretty simple types. But when you're regularly writing code like:
for (std::map<std::string, std::string>::iterator it = m.begin(); it != m.end(); ++it) { ... }
then it gets quite annoying, since the type name makes up half the line, and it obscures the high-level intent of the code (which is simply to iterate over `m`). (And that's not the real type anyway; `std::string` is the templated `std::basic_string<char>`, and the `iterator` is a typedef which is documented to be a LegacyBidirectionalIterator which is a LegacyForwardIterator which is a LegacyIterator which specifies the `++it` operation etc, so in practice you're not going to figure out how the type behaves from the documentation - you're really going to need a type-aware text editor or IDE, at least until you've memorised enough of the typical library usage patterns. That's just an obligatory part of modern programming.)
Or in Rust you might rely on type inference like:
let v = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
where you can see the important information (that it ends up with a vector of ints), and you can assume `v` is some sort of iterable thing but you don't care exactly what. Writing it explicitly would be something terrible like:
let v: std::iter::Map<std::str::SplitAsciiWhitespace<'_>, impl Fn(&str) -> i32> = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
except that won't actually work because the `'_` is referring to a lifetime which I don't think there is any way to express in code; and the closure is actually an anonymous type (constructed by the compiler to contain any captured variables) which implements the `Fn` trait, and you can only use the `impl Trait` syntax in argument types (where it's a form of generics) and return types (where it's a kind of information hiding), not in variable bindings, so there's no way to name the closure type. Rust's statically-checked lifetimes and non-heap-allocated closures are useful features that simply can't work without type inference.
Posted Dec 10, 2023 21:52 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (5 responses)
Sure, I have no idea what "type" chars actually is, but it's clearly some sort of Iterator, and somebody named it chars, I feel entitled to assume it impl Iterator<Item = char> unless it's obvious in context that it doesn't.
If anything I think I more often resent needing to spell out types for e.g. constants where I'm obliged to specify that const MAX_EXPIRY: MyDayType = 398; rather than let the compiler figure out that's the only correct type. I don't hate that enough to think it should be changed, it makes sense, but I definitely run into it more often than I regret not knowing the type of chars in a construction like let chars = foo.bar().into_iter()
However, of course C has lots of footguns which I can imagine would be worsened with inference, so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.
Posted Dec 10, 2023 22:07 UTC (Sun)
by mb (subscriber, #50428)
[Link]
Yes, that is true.
Type inference works well in Rust due to its strict type system.
Posted Dec 10, 2023 23:00 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
I would agree with this. The main concern I can think of is how C handles numeric conversions. They are messy, complicated, and I always have to look them up.[1] They can mostly be summarized as "promote everything to the narrowest type that can represent all values of both argument types, and if an integer, is at least as wide as int," but that summary is wrong (float usually *can't* represent all values of int, but C will just promote int to float anyway). Throwing type inference on top of that mess is probably just going to make things worse.
By contrast, Rust has no such logic. If you add i32 + i16, or any other situation where the types do not match, you just get a flat compiler error.
I do wish Rust would let me write this:
let x: i32 = 1;
(Presumably this is because you can also add i32 + &i32, and the compiler isn't quite smart enough to rule out that override.)
The compiler suggests writing this abomination, which does work:
let z: i32 = x + <i16 as Into<i32>>::into(y);
But at least you can write this:
let x: i32 = 1;
Posted Dec 11, 2023 3:08 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
let z: i32 = x + i32::from(y);
Obviously I need to spend more time studying Rust, or maybe actually sit down and write a toy program in it.
Finally, I should note that you can write "y as i32", but that's less safe because it will silently do a narrowing conversion. from() and into() can only do conversions that never lose data, and there's also try_from()/try_into() if you want to handle overflow explicitly.
Posted Dec 11, 2023 13:08 UTC (Mon)
by gspr (guest, #91542)
[Link] (1 responses)
And there's try_from().expect("Conversion failure") for those cases where you wanna say "man, I don't really wanna think about this, and I'm sure the one type converts to the other without loss in all cases my program experiences – but if I did overlook something, then at least abort with an error message instead of introducing silent errors".
Posted May 8, 2024 15:41 UTC (Wed)
by adobriyan (subscriber, #30858)
[Link]
The "messy numeric conversions" are largely due to rubber types and the fact that there are lots of them
If all you have is what Rust has, C is not _that_ bad.
Kernel has certain number of min(x, 1UL) expression just because x is "unsigned long", but it is clear that programmer wants typeof(x).
Posted Dec 11, 2023 4:43 UTC (Mon)
by swilmet (subscriber, #98424)
[Link] (13 responses)
Both C++ and Rust have a large core language, while C has a small core language.
I see Rust more as a successor to C++. C programmers in general - I think - like the fact that C has a small core language. So in C the types remain small to write, and there are more function calls instead of using sophisticated core language features. C is thus more verbose, and verbosity can be seen as an advantage.
Maybe the solution is to create a SubC language: a subset of C that is safe (or at least safer). That's already partly the case with the compiler options, hardening efforts etc.
Posted Dec 11, 2023 8:39 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (12 responses)
I disagree with this, assuming that "safe" means "cannot cause UB outside of an unsafe block." A safe version of C needs at least the following:
* Lifetimes and borrow checking, which implies a type annotation similar to generics.
I just don't see how you provide all of that flexibility without doing monomorphization, at which point you're already 80% of the way to reinventing Rust.
Posted Dec 11, 2023 11:10 UTC (Mon)
by Sesse (subscriber, #53779)
[Link] (3 responses)
Posted Dec 11, 2023 13:52 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (2 responses)
If you're not careful, you end up with something like Wuffs. A perfectly useful language in some domains, but deliberately limited in scope to stop you writing many classes of bug.
Posted Dec 14, 2023 10:55 UTC (Thu)
by swilmet (subscriber, #98424)
[Link] (1 responses)
Posted Dec 14, 2023 10:57 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
You're not going to get very far when you can't access arguments, or do I/O. Wuffs is deliberately limited to not doing that, because it's dangerous to mix I/O with file format parsing.
Posted Dec 11, 2023 11:35 UTC (Mon)
by swilmet (subscriber, #98424)
[Link] (7 responses)
But why not trying a C-to-Rust transpiler? (random idea).
By keeping a small core language with the C syntax, and having a new standard library that looks like Rust but uses more function calls instead.
The transpiler would "take" the new stdlib as part of the language, for performance reasons, and translates the function calls to Rust idioms.
A source-to-source compiler is of course not ideal, but that's how C++ was created ("C with classes" was initially translated to C code).
Posted Dec 11, 2023 12:09 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (6 responses)
You might want to look at the C2Rust project; the issue is that a clean transpiler to Rust has to use unsafe liberally, since C constructs translate to something that can't be represented in purely Safe Rust.
The challenge then becomes adding something like lifetimes (so that you can translate pointers to Rust references instead of Rust raw pointers) without "bloating" C. I suspect that it's impossible to have a tiny core language without pushing many problems into the domain of "the programmer simply must not make any relevant mistakes"; note, though, that this is not bi-directional, since a language with a big core can still push many problems into that domain.
Posted Dec 12, 2023 10:32 UTC (Tue)
by swilmet (subscriber, #98424)
[Link] (5 responses)
But I had the idea to convert (a subset of) C to _safe_ Rust, of course. Instead of some Rust keywords, operators etc (the core language), have C functions instead.
Actually the GLib/GObject project is looking to have Rust-like way of handling things, see:
Anyway, that's an interesting topic for researchers. Then making it useful and consumable for real-world C projects is yet another task.
Posted Dec 12, 2023 10:43 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
The hard part is not the keywords and operators - it's the lifetime annotation system. Lifetimes are a check on what the programmer intended, so have to be possible to write as an annotation to pointer types in the C derived language, but then to be usable force you to have a generics system (since you want many things to be generic over a lifetime) with (at least) covariance and invariance possible to express.
And once you have a generics system that can express covariance and invariance for each item in a set of generic parameters, why wouldn't you allow that to be used for types as well as lifetimes? At which point, you have Rust traits and structs, and most of the complexity of Rust.
Posted Dec 12, 2023 11:34 UTC (Tue)
by mb (subscriber, #50428)
[Link] (3 responses)
That is not possible, except for very trivial cases.
The C code does neither include enough information (e.g. lifetimes) for that to work, nor is it usually structured in a way for this to work.
Programming in Rust requires a different way of thinking and a different way of structuring your code. An automatic translation of the usual ideomatic C programs will fail so hard that it would be easier to rewrite it from scratch instead of translating it and then fixing the compile failures.
Posted Dec 13, 2023 23:59 UTC (Wed)
by swilmet (subscriber, #98424)
[Link] (2 responses)
I started to learn Rust but dislike the fact that it has many core features ("high-level ergonomics"). It's probably possible to use Rust in a simplistic way though, except maybe if a library forces to use the fancy features.
Posted Dec 14, 2023 9:37 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (1 responses)
You could avoid using those libraries, and limit yourself to libraries that have a "simple" enough interface for you (no_std libraries are a good thing to look for here, since they're designed with just core and maybe alloc in mind, not the whole of std) - bearing in mind that you don't need to care how those libraries are implemented if it's just about personal preference.
In general, though, I wouldn't be scared of a complex core language - all of that complexity has to be handled somewhere, and a complex core language can mean that complexity is being compiler-checked instead of human-checked.
Posted Dec 14, 2023 11:07 UTC (Thu)
by swilmet (subscriber, #98424)
[Link]
"Soft"ware, they said :-)
Posted Dec 10, 2023 12:03 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
Have a variable type of "infer"? That way, an undeclared variable is still an error, but you can explicitly tell the compiler to decide for itself :-)
Cheers,
About type inference coming to the C language as well
About type inference coming to the C language as well
let vals: Vec<i32> = v.collect();
let vals: Vec<i32> = v.collect();
About type inference coming to the C language as well
About type inference coming to the C language as well
But a subset of Rust's type inference will probably work well in C.
About type inference coming to the C language as well
let y: i16 = 2;
let z: i32 = x + y.into(); // Compiler error!
let y: i16 = 2;
let y32: i32 = y.into();
let z: i32 = x + y32;
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
(5 main, __uint128, size_t, uintptr_t, intmax_t, ptrdiff_t). POSIX doesn't help with off_t.
About type inference coming to the C language as well
About type inference coming to the C language as well
* Type inference, or else you have to write lifetime annotations everywhere.
* Box<T> or something equivalent to Box<T>, or else you can't put big objects on the heap and move their ownership around.
* Arc<RwLock<T>> or some equivalent, or else you have no reasonable escape hatch from the borrow checker (other than unsafe blocks).
* Rc<RefCell<T>> or some equivalent, or else you have to use the multithreaded escape hatch even in single-threaded code.
* And then there are many other optimizations such as using Mutex<T> instead of RwLock<T>, or OnceCell<T> instead of RefCell<T>. All of these have valid equivalents in C, and should be possible to represent in our hypothesized "safe C" (without needing more than a minimal amount of unsafe, preferably buried somewhere in the stdlib so that "regular" code can be safe).
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
https://www.bassi.io/articles/2023/08/23/the-mirror/
(but a bit long to read, and one needs to know the GObject world to understand the blog post I think).
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
About type inference coming to the C language as well
Wol