It can still crash
It can still crash
Posted Feb 10, 2025 18:15 UTC (Mon) by mb (subscriber, #50428)Parent article: Maintainer opinions on Rust-for-Linux
That's not correct. Crashing is safe. But it's really a bad idea to crash the kernel.
I think the more obvious advantage of the Rust version of "f" is that the references are guaranteed to point to valid memory. Which is not the case for the C variant.
Posted Feb 10, 2025 18:40 UTC (Mon)
by jengelh (guest, #33263)
[Link] (3 responses)
Posted Feb 11, 2025 20:31 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
* A C++ reference must be initialized to point at something, and cannot be NULL (or nullptr).
There is nothing which prevents you from destroying the pointee out from under the reference, which causes UB if you use the reference afterwards, so it is unsafe. It does prevent a few potentially dangerous patterns such as null pointers and pointers that get changed at runtime, but that is not the same thing as memory safety.
Posted Feb 13, 2025 2:14 UTC (Thu)
by milesrout (subscriber, #126894)
[Link] (1 responses)
Posted Feb 14, 2025 12:01 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link]
This will also happen if you use an operator, take AddAssign the trait implementing the += operator
description += " and then I woke up.";
... Is eventually just AddAssign::add_assign(&mut description, " and then I woke up.");
So if we lent out any reference to description that's still outstanding, we can't do this, likewise if we only have an immutable reference we can't use that to do this either. But the visible syntax shows no sign that this is the case.
In practice because humans are fallible, the most important thing is that this is checked by the compiler. It's valuable to know that my_function takes a mutable reference, when you're writing the software, when you're reading software other people wrote, and most definitely when re-reading your own software - but it's *most* valuable that the compiler checks because I might miss that, all three times.
Posted Feb 10, 2025 18:43 UTC (Mon)
by ncultra (✭ supporter ✭, #121511)
[Link] (92 responses)
Posted Feb 10, 2025 19:05 UTC (Mon)
by daroc (editor, #160859)
[Link] (91 responses)
So yes, an incorrect value returned to the rest of the kernel could cause a crash later. And the logic can always be incorrect. But safe Rust in never going to cause a null pointer dereference — which there's no good way to annotate in C. There are a number of properties like that.
Posted Feb 10, 2025 19:18 UTC (Mon)
by mb (subscriber, #50428)
[Link] (32 responses)
That way certain parts of the program can be guaranteed to not crash, because range checks have already been done at one earlier point and that assurance is passed downstream via the type system, for example.
But that needs a bit more complex example to show.
Posted Feb 11, 2025 17:18 UTC (Tue)
by acarno (subscriber, #123476)
[Link] (31 responses)
That said - the properties you encode in Ada aren't quite equivalent (to my understanding) as those in Rust. Ada's focus is more on mathematical correctness, Rust's focus is more on concurrency and memory correctness.
Posted Feb 11, 2025 18:28 UTC (Tue)
by mb (subscriber, #50428)
[Link] (30 responses)
For a simple example see https://doc.rust-lang.org/std/path/struct.Path.html
There are many much more complex examples in crates outside of the std library where certain operations cause objects to become objects of other types because the logical/mathematical properties change. This is possible due to the move-semantics consuming the original object, so that it's impossible to go on using it. And with the new object type it's impossible to do the "old things" of the previous type.
Posted Feb 11, 2025 21:37 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Posted Feb 12, 2025 0:06 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (28 responses)
1. If some precondition holds, you either already have an instance of that type, or can easily get one.
To avoid repeating myself over and over again: Assume that all of these rules are qualified with "..., unless you use some kind of escape hatch, like a type cast, reflection, unsafe, etc."
(1) and (4) are doable in nearly every reasonable programming language. (2) is possible in most languages that have visibility specifiers like pub or private. (5) is impossible in most "managed" languages because they (usually) require objects to have some minimal runtime gunk for purposes such as garbage collection and dynamic dispatch, but systems languages should be able to do at least the zero-cost wrapper without any difficulty, and managed languages may have smarter optimizers than I'm giving them credit for.
(3) is the real sticking point for most languages. It is pretty hard to do (3) without some kind of substructural typing, and that is not something that appears in a lot of mainstream programming languages besides Rust.
Just to briefly explain what this means:
* A "normal" type may be used any number of times (in any given scope where a variable of that type exists). Most types, in most programming languages, are normal. Substructural typing refers to the situation where at least one type is not normal.
It can be argued that Rust's types are affine at the level of syntax, but ultimately desugar into linear types because of the drop glue (i.e. the code emitted to automatically drop any object that goes out of scope, as well as all of its constituent parts recursively). If there were an option to "opt out" of generating drop glue for a given type (and fail the compilation if the type is used in a way that would normally generate drop glue), then Rust would have true linear types, but there are a bunch of small details that need to be worked out before this can be done.
Posted Feb 12, 2025 20:02 UTC (Wed)
by khim (subscriber, #9252)
[Link] (10 responses)
Ughm… before it would be usable you wanted to say? Isn't that trivial? Like this The big question is: what to do about
Posted Feb 12, 2025 20:14 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (9 responses)
Posted Feb 12, 2025 20:30 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
Posted Feb 12, 2025 20:30 UTC (Wed)
by khim (subscriber, #9252)
[Link] (7 responses)
Isn't that how linear types would work in any other language, too? Precisely because of halting problem such systems have to reject certain programs that are, in fact, valid. That's what Rust does with references, too. And like usual you may solve the issue with If you would attempt to use this code you'll find out that biggest issue is not with this, purely theoretical, problem, but with much more practical issue: lots of crates assume that types are affine and so many code fragments that “should be fine” in reality are generating drops at failure cases. Not in sense “the compiler have misunderstood something and decided to materialize drop that's not needed there”, but in a sense “the only reason drop shouldn't be used at runtme here is because of properties that compiler doesn't even know about and couldn't verify”. Unwinding `panic!` is the most common offender, but there are many others. IOW: problem with linear types in Rust are not with finding better syntax for them, but with the decisions about what to do with millions of lines of code that's already written… and that's not really compatible with linear types.
Posted Feb 12, 2025 22:02 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (6 responses)
Posted Feb 12, 2025 22:18 UTC (Wed)
by khim (subscriber, #9252)
[Link] (5 responses)
But that's precisely what I'm talking about. If you would take this exact Linear types are easy to achieve in Rust… but then you need to rewrite more-or-less all I don't see how changes to the language may fix that. If you go with linking tricks then you can cheat a bit, but “real” linear types, if they would be added to the language, would face the exact same issue that we see here: compiler would need to prove that it doesn't need drop in exact same places and would face the exact same issues with that. IOW: changing compiler to make “real” linear types and changing the compiler to make If someone is interested in bringing them to Rust that taking that implementation that already exist and looking on the changes needed to support it would be much better than discussions about proper syntax.
Posted Feb 12, 2025 22:42 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
> IOW: changing compiler to make “real” linear types and changing the compiler to make static_assert-based linear types work are not two different kinds of work, but, in fact, exactly the same work.
Currently, the reference[1] says that this is allowed to fail at compile time:
if false {
I agree that, if a given type T is manifestly declared as linear, then the compiler does have to prove that <T as Drop>::drop is never invoked. But I tend to assume that no such guarantee will be provided for arbitrary T, because Rust compile times are already too slow as it is, so you will have to use whatever bespoke syntax they provide for doing that, or else it will continue to be just as brittle as it is now.
Speaking of bespoke syntax, the most "obvious" spelling would be impl !Drop for T{} (like Send and Sync). But if you impl !Drop, then type coherence means you can't impl Drop, and therefore can't put a const assert in its implementation. Maybe they will use a different spelling, to allow for (e.g.) drop-on-unwinding-panic to happen (as a pragmatic loophole to avoid breaking too much code), but then the const assert probably won't work either (because the compiler will not attempt to prove that the panic drop glue is never invoked).
[1]: https://doc.rust-lang.org/reference/expressions/block-exp...
Posted Feb 12, 2025 23:02 UTC (Wed)
by khim (subscriber, #9252)
[Link] (3 responses)
C++17 solved that with the if constexpr that have such guarantees. Why is it not a problem with C++, then? No. There are no need for that. Unlike traits resolution And yes, C++ have a rule that if something couldn't be instantiated then it's not a compile-time error if that instantiation is not needed. That included destructors. It was there since day one, that is, from year C++98 and while MSVC was notoriously bad at following that rule clang was always pretty precise. If some instantiations of that To the large degree it's chicken-end-egg issue: C++ have, essentially, build all it's advanced techniques around SFINAE and thus compilers learned to handle it well, in Rust very few developer even know or care that it's analogue exist in a language, thus it's not handled correctly in many cases. But no, it's not matter of complicated math or something that should slow down the compilation, on the contrary that's something that's relatively easy to implement: C++ is in existential proof. Yes, but, ironically enough, that would require a lot of work because if you do that then you are elevating the whole thing to the level of types, errors are detected pre-monomorphisation and now it's not longer an issue of implementing things carefully but it becomes a typesystem problem.
Posted Feb 13, 2025 0:14 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
It has to be in the type system in some form, or else generic code cannot coherently interact with it (unless we want to throw away literally all code that drops anything and start over with a system where drop glue is not a thing - but that's so obviously a non-starter that I cannot imagine you could be seriously proposing it). We do not want to recreate the C++ situation where all the types match, but then something blows up in monomorphization.
To be more concrete: Currently, in Rust, you can (mostly) figure out whether a given generic specialization is valid by reading its trait bounds. While it is possible to write a const block that imposes additional constraints, this is usually used for things like FFI and other cases where the type system doesn't "know" enough to stop us from doing something dangerous. Outside of those special cases (which typically make heavy use of unsafe), the general expectation is that if a type matches a given set of trait bounds, then I can use it as such.
This is useful for pedagogical reasons, but it is also just a heck of a lot more convenient. When I'm writing concrete code, I don't have to grep for std::static_assert or read compiler errors to figure out what methods I'm allowed to call. I can just read the trait bounds in the rustdoc and match them against my concrete types. When I'm writing generic code, I don't have to instantiate a bunch of specializations to try and shake out compiler errors one type at a time. The compiler will check the bounds for me and error out if I write something incoherent, even before I write any tests for my code.
But trait bounds are more than just a convenient linting service. They are a semver promise. If I have a function foo<T: Send>(t: T), I am not allowed to change it to foo<T: Send + Sync>(t: T) without breaking backwards compatibility. If you wrote foo::<RefCell>(cell) somewhere in your code, I have promised that that will continue to work in future releases, even if I never specifically thought about RefCell. Droppability completely breaks this premise. If droppability is only determined at monomorphization, then I can write a function bar<T>(t: T) -> Wrapper<T> (for some kind of Wrapper type) that does not drop its argument, and then later release a new version that has one unusual code path where the argument does get dropped (by one of its transitive dependencies, just to make the whole thing harder to troubleshoot). Under Rust as it currently exists, that is not a compatibility break, and it would be very bad if it was. We would have to audit every change to every generic function for new drop glue, or else risk breaking users of undroppable types. Nobody is actually going to do that, so we're simply not going to be semver compliant in this hypothetical.
Posted Feb 13, 2025 8:00 UTC (Thu)
by khim (subscriber, #9252)
[Link] (1 responses)
Well… it's like Greenspun's tenth rule. Rust didn't want to “recreate the C++ situation” and as a result it just made bad, non-functional copy. As you saw “something blows up in monomorphization” is already possible, only it's unreliable, had bad diagnosis and in all aspects worse that C++. Perfect is enemy of good and this story is great illustration IMNSHO. Yeah. A great/awful property which works nicely, most of the time, but falls to pieces when you really try to push. It's convenient till you need to write 566 methods instead of 12. At this point it becomes both a PITA and compilation times skyrocket. But when these bounds are not there from the beginning they are becoming more of “semver rejection”. That's why we still have no support for lending iterator in At some point you have to accept that your language couldn't give you perfect solution and can only give you good one. Rust developers haven't accepted that yet and thus we don't have solutions for many issues at all. And when choice is between “that's impossible” and “that's possible, but with caveats” practical people pick the latter option. Except, as you have already demonstrated, that's not true, this capability already exist in Rust – and “perfect solutions” don't exist… after 10 years of development. Maybe it time to accept the fact that “perfect solutions” are not always feasible. That's already a reality, Rust is already like this. It's time to just accept that.
Posted Feb 13, 2025 18:19 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
This is exactly the same attitude that everyone had towards borrow checking before Rust existed.
Sure, *eventually* you probably do have to give up and stop trying to Change The World - a language has to be finished at some point. But Rust is very obviously not there yet. It would be a shame if they gave up on having nice, well-behaved abstractions just because some of the theory is inconveniently complicated. Traits and bounds are improving, slowly, along with most of the rest of the language. For example, the standard library already makes some use of specialization[1][2], a sorely missing feature that is currently in the process of being stabilized.
Rust is not saying "that's impossible." They're saying "we want to take the time to try and do that right." I say, let them. Even if they fail, we can learn from that failure and adapt. But if you never try, you can never succeed.
[1]: https://rust-lang.github.io/rfcs/1210-impl-specialization.html
Posted Feb 12, 2025 20:28 UTC (Wed)
by plugwash (subscriber, #29694)
[Link] (16 responses)
There are two problems though.
1. A type can be forgotten without being dropped, known as "leaking" the value. There was a discussion in the run up to rust 1.0 about whether this should be considered unsafe, which ultimately came down on the side of no.
Posted Feb 12, 2025 21:55 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (15 responses)
A weakly-undroppable type may not be dropped. It may be leaked, forgotten, put into an Arc/Rc cycle, or smuggled inside of ManuallyDrop. It may also be dropped on an unwinding panic (with the understanding that panics are bad, unwinding is worse, and some features just won't play nicely with them). It does not provide a safety guarantee, so unsafe code must assume that undroppable types may still get lost by other means.
A strongly-undroppable type may not be dropped, and additionally provides a safety guarantee that its drop glue is never called. It cannot be dropped on unwinding panic, so such panics are converted into aborts if they would drop the object (you can still use catch_unwind to manually handle the situation, if you really want to). Unsafe code may assume that the object is never dropped, but still may not make any other assumptions about the object's ultimate fate.
An unleakable type may not be leaked or forgotten. It must always be dropped or destructured. You may not put it into Rc, Arc, std::mem::forget, ManuallyDrop, MaybeUninit (which is just ManuallyDrop in a funny hat), Box::leak, or anything else that could cause it to become leaked (unsafe code may do some or all of these things, but must not actually leak the object). You also may not define a static of that type, because statics don't get dropped at exit and are functionally equivalent to leaking an instance on startup.
A strongly-linear type is both strongly-undroppable and unleakable. It cannot be dropped or leaked by any means, and is subject to all of the constraints listed above. It may only be destructured by code that has visibility into all of its fields.
Now my commentary:
* Weakly-undroppable types are already very useful as a form of linting. For example, the std::fs::File type currently does not have a close() method, because the type already closes itself on drop. But that means that it swallows errors when it is closed. The documentation recommends calling sync_all() (which is equivalent to fsync(2)) if you care about errors, but I imagine that some filesystem developers would have choice words about doing that in lieu of checking the error code from close(2). If File were weakly-undroppable, then it could provide a close() method that returns errors (e.g. as Result<()>) and fails the compilation if you forget to call it. This isn't a safety issue since the program won't perform UB if you forget to close a file, so we don't need a strong guarantee that it is impossible to do so. We just want a really strong lint to stop the user from making a silly mistake. It would also help with certain problems involving async destructors, but I don't pretend to understand async nearly well enough to explain that. On the downside, it would interact poorly with most generic code, and you'd probably end up copying the semantics of ?Sized to avoid massive backcompat headaches (i.e. every generic type would be droppable by default, and would need to be qualified as ?Drop to allow undroppable types).
Of course, there is another problem: We cannot guarantee that an arbitrary Turing-complete program makes forward progress. If the program drops into an infinite loop, deadlock, etc., then no existing object will ever get cleaned up, meaning that everything is de facto leaked whether our types allow for it or not. To some extent, this is fine, because a program stuck in an infinite loop will never execute unsafe code that makes assumptions about how objects are cleaned up. To some extent, it is not fine, because we can write this function (assuming that ?Leak means "a type that can be unleakable"):
fn really_forget<T: Send + 'static + ?Leak>(t: T){
...or some variation thereof, and there is probably no general way to forbid such functions from existing. So any type that is Send + 'static (i.e. has no lifetime parameters and can be moved between threads) should be implicitly Leak.
The "obvious" approach is to make 'static imply Leak, and require all unleakable (and maybe also undroppable) types to have an associated lifetime parameter, which describes the lifetime in which they are required to be cleaned up. More pragmatically, you might instead say that 'static + !Leak is allowed as a matter of type coherence, but provides no useful guarantees beyond 'static alone, and unsafe code must have a lifetime bound if it wants to depend on something not leaking. I'm not entirely sure how feasible that is in practice, but it is probably more theoretically sound than just having !Leak imply no leaks by itself, and unsafe code probably does want to have a lifetime bound anyway (it provides a more concrete and specific guarantee than "no leaks," since it allows you to assert that object A is cleaned up no later than object B).
Posted Feb 12, 2025 22:33 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (5 responses)
Note that Rust could, with some effort, have a method fn close (self) -> io::Result<()>, without the weakly-undroppable property, so that developers who really care can get at the errors from closing a file. It'd be stronger if it'd fail the compilation if you forgot to call it, but it'd resolve the issue with those filesystem developers.
In practice, though, I'm struggling to think of a case where sync_all( also known as fsync(2)) is the wrong thing, and checking returns from close(2) is the right thing. The problem is that close returning no error is a rather nebulous state - there's not really any guarantees about what this means, beyond Linux telling you that the FD is definitely closed (albeit this is non-standard - the FD state is "unspecified" on error by POSIX) - whereas fsync at least guarantees that this file's data and its metadata are fully written to the permanent storage device.
Posted Feb 12, 2025 23:06 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
High-performance file I/O is an exercise in optimism. By not calling fsync, you accept some probability of silent data loss in exchange for higher performance. But there's an even more performant way of doing that: You can just skip all the calls to File::write(), and for that matter, skip opening the file altogether, and just throw away the data now.
Presumably, then, it is not enough to just maximize performance. We also want to lower the probability of data loss as much as possible, without compromising performance. Given that this is an optimization problem, we can imagine various different points along the tradeoff curve:
* Always call fsync. Zero probability of silent data loss (ignoring hardware failures and other things beyond our reasonable control), but slower.
Really, it's the middle one that makes no sense, since one main memory cache miss is hardly worth writing home about in terms of performance. Maybe if you're writing a ton of small files really quickly, but then errno will be in cache and the performance cost becomes entirely unremarkable.
Posted Feb 13, 2025 10:13 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (3 responses)
This is the core of our disagreement - as far as I can find, the probability of silent data loss on Linux is about the same whether or not you check the close return code, with the exception of NFS. Because you can't do anything with the FD after close, all FSes but NFS seem to only return EINTR (if a signal interrupted the call) or EBADF (you supplied a bad file descriptor), and in either case, the FD is closed. NFS is slightly different, because it can return the server error associated with a previous write call, but it still closes the FD, so there is no way to recover.
Posted Feb 13, 2025 17:54 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Of course error-on-close is recoverable, you just delete the file and start over. Or if that doesn't work, report the error to the user so that they know their data has not been saved (and can take whatever action they deem appropriate, such as saving the data to a different filesystem, copying the data into the system clipboard and pasting it somewhere to be preserved by other means, etc.).
Posted Feb 13, 2025 18:12 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
Until you can't start over ... which is probably par for the course in most data entry applications ...
Cheers,
Posted Feb 14, 2025 10:49 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
Deleting the file is the worst possible thing to do with an error on close - two of the three are cases where the data has been saved, and it's an oddity of your code that resulted in the error being reported on close. The third is one where the file is on an NFS mount, the NFS server is set up to write data immediately upon receiving a command (rather than on fsync, since you won't get a delayed error for a write if the NFS server is itself caching) and you didn't fsync before close (required on NFS to guarantee that you get errors).
But even in the latter case, close is not enough to guarantee that you get a meaningful error that tells you that the data has not been saved - you need fsync, since the NFS server is permitted to return success to all writes and closes, and only error on fsync.
And just to be completely clear, I think this makes error on close useless, because all it means in most cases is either "your program has a bug" or "a signal happened at a funny moment". There's a rare edge case if you have a weird NFS setup where an error on close can mean "data lost", but if you're not in that edge case (which cannot be detected programmatically, since it depends on the NFS server's configuration), the two worst possible things you can do if there's an error on close are "delete the file (containing safe data) and start over" and "report to the user that you've saved their data, probably, so that they can take action just in case this is an edge case system.
On the other hand, fsync deterministically tells you either that the data is as safe as can reasonably be promised, or thatit's lost, and you should take action.
Posted Feb 13, 2025 13:24 UTC (Thu)
by daroc (editor, #160859)
[Link] (8 responses)
You are of course right in general; the price that Idris and Agda pay for being able to say that some programs terminate is that the termination checker is not perfect, and will sometimes disallow perfectly okay programs. So I don't think it's necessarily a good idea for Rust to add termination checking to its type system, but it is technically a possibility.
Posted Feb 13, 2025 15:13 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (7 responses)
Can't Rust have several checkers? If any one of them returns "okay", then the proof has passed and the code is okay. The programmer could then also add hints, maybe saying "run these checkers in this order", or "don't bother with these checkers", or whatever. So long as the rule is "any positive result from a checker is okay", that could reduce the checking time considerably.
Oh - and I don't know how many other languages do this sort of thing, but DataBASIC decrements -1 to 0 to find the end of an array :-) It started out as a "feature", and then people came to rely on it so it's standard documented behaviour.
(I remember surprising a C tutor by adding a bunch of bools together - again totally normal DataBASIC behaviour, but it works in C as well because I believe TRUE is defined as 1 in the standard?)
Cheers,
Posted Feb 13, 2025 15:34 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (5 responses)
If we could ensure that "undecided" was small enough, we'd not have a problem - but the problem we have is that all known termination checkers reject programs that humans believe terminate.
Posted Feb 13, 2025 18:01 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (4 responses)
It only takes one checker to return success, and we know that that Rust code is okay. So if we know (or suspect) which is the best checker to run, why can't we give Rust hints, to minimise the amount of checking Rust (has to) do. Which then means we can run more expensive checkers at less cost.
Cheers,
Posted Feb 13, 2025 18:31 UTC (Thu)
by daroc (editor, #160859)
[Link] (2 responses)
So the tradeoff will always be between not being able to check this property, or being able to check it but rejecting some programs that are probably fine.
That said, I do think there is a place for termination checking in some languages — the fact that Idris has it lets you do some really amazing things with dependent typing. Whether _Rust_ should accept that tradeoff is a matter of which things it will make harder and which things it will make easier, not just performance.
Posted Feb 14, 2025 15:12 UTC (Fri)
by taladar (subscriber, #68407)
[Link] (1 responses)
Posted Feb 14, 2025 15:27 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
Idris then uses this to determine whether it can evaluate a function at compile time (total functions) or whether it must defer to runtime (partial functions). This becomes important because Idris is dependently typed, so you can write a type that depends on the outcome of evaluating a function; if that function is total, then the type can be fully checked at compile time, while if it's partial, it cannot.
Posted Feb 14, 2025 11:02 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
Posted Feb 13, 2025 16:06 UTC (Thu)
by adobriyan (subscriber, #30858)
[Link]
It works because "_Bool + _Bool" is upcasted to "int + int" first and then true's are implicitly upcasted to 1's.
Posted Feb 11, 2025 14:33 UTC (Tue)
by alx.manpages (subscriber, #145117)
[Link] (56 responses)
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3422.pdf>
There's a lot of work in making C great again (lol). There are certainly rough corners in C, and they should be fixed, but dumping the language and learning a completely new language, just to eventually find that the new language has different rough corners, is a bad idea. Let's fix the language instead.
Posted Feb 11, 2025 14:51 UTC (Tue)
by intelfx (subscriber, #130118)
[Link] (35 responses)
Not to come off as a zealot, but I'm really skeptical that nullability annotations can cover even a fraction of the convenience and safety benefits that pervasive use of Option<> brings to Rust (let alone ADTs in general, because there's so much more than just Option<>).
Reducing this to "C has rough corners, so what, Rust has different rough corners too" feels almost disingenuous.
Posted Feb 11, 2025 15:12 UTC (Tue)
by alx.manpages (subscriber, #145117)
[Link] (34 responses)
Time will tell. If _Optional proves to not be enough, and there's something else needed in C to have strong safety against NULL pointers, we'll certainly research that. So far, _Optional looks promising.
Posted Feb 11, 2025 20:22 UTC (Tue)
by khim (subscriber, #9252)
[Link] (33 responses)
To understand why addition of Just look on the signature of strstr: it takes two pointers to immutable strings… yet returns pointer to a mutable string! WTH? What's the point? What kind of safety it is? The point is that this function was created before introduction of Both C++ and Rust solve the issue in the same way: instead of trying to decide whether the result is mutable or immutable string When you add new invariants to the type system to really fully benefit from them one needs to, essentially, rewrite everything from scratch… and if you plan to rewrite all the code anyway, then why not pick another, better and more modern language? P.S. The real irony is, of course, that kernel developers understand that better than anyone. They pretty routinely do significant and complicated multi-year surgery to rebuild the whole thing on the new foundations (how many years did it took to remove BKL, remind me?), but when the proposal is not to replace the language… the opposition becomes religious and not technical, for some reason…
Posted Feb 11, 2025 22:23 UTC (Tue)
by alx.manpages (subscriber, #145117)
[Link] (32 responses)
You may be happy that C23 changed that.
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pd...>
The prototype of strstr(3) is now
QChar *strstr(QChar *s1, const char *s2);
which means it returns a pointer that is const-qualified iff s1 is const-qualified. It is (most likely) implemented as a type-generic macro with _Generic().
The other string APIs have of course been improved in the same way.
So yes, the language evolves slowly, but it continuously evolves into a safer dialect.
---
> And everyone remembers the fate of noalias, isn't it?
We're discussing something regarding noalias. It's not easy to come up with something good enough, though. Don't expect it to be there tomorrow. But it's not been forgotten.
---
> Both C++ and Rust solve the issue in the same way: instead
Having researched into string APIs for some years almost half of my workday, returning a pointer is usually more useful. You just need to get right the issue with qualifiers. It needed _Generic(), but we've arrived there, finally.
Posted Feb 11, 2025 22:53 UTC (Tue)
by khim (subscriber, #9252)
[Link] (31 responses)
So it took 33 years to replace one function. Great. How quickly would it propagate through project of Linux size, at this rate? 330 or 3300 years? Sure, but evolution speed is so glacial that it only makes sense if you postulate, by fiat, that full rewrite in another language is not an option. And don't forget that backers may, at some point, just throw up in towel, unable to deal with excruciating inability to change anything of substance. Apple have Swift, Google is still picking between Rust and Carbon, but eventual decision would be one or another, but, notably, not C or C++, Microsoft seems to think about Rust too… so who would be left to advance C and C++ after all players that actually wanted to change it would leave? The question is not whether it's forgotten or not, but if we can expect to see program with most pointers either market And the simple answer, given the above example, is that one may spend maybe 10 or 20 years rewriting Linux kernel in Rust (yes, that's big work, but a journey of a thousand miles begins with a single step)… or go with with C – and then never achieve that. Simply because in 50 or 100 year, well before C would become ready to adopt such paradigm, everything would be rewritten in something else than C anyway. Simply because it's hard to find about under 40 (let alone anyone under 30) who may even want to touch C if they have a choice. No, it's not. It's only “more useful” if you insist on zero-string abominations. If your strings are proper slices (or standalone strings on the heap… only C conflates them, C++, Rust and even such languages as Java and C# have separate types) then returning pointer is not useful. You either need to have generic type that returns slice or return index. And returning index is more flexible. No, you also need to guarantee that C would continue to be used. That's a tall order. I wonder why no one ever made proper C/C++ replacement (as in: language that is designed to interoperate with C/C++ but is not built on top of C/C++ core) before Rust… but now, when it's done, we may finally face the question about why should we continue to support strange and broken C semantics with null-terminated strings… invent crazy schemes, CPU extensions – all to support something that shouldn't have existed in the first place. That's not the question for the next 3-5 years, but in 10 years… when world would separate into competing factions… it would be interesting to see if any of them would stay faithful to C/C++ and what would they pick up instead to develop “sovereign software lands”.
Posted Feb 11, 2025 23:22 UTC (Tue)
by alx.manpages (subscriber, #145117)
[Link] (10 responses)
Not one. The entire libc has been updated in that direction.
> How quickly would it propagate through project of Linux size, at this rate?
The problem was devising the way to do it properly. Once we have that, the idea can be propagated more easily. It's not like you need a decade to update each one function.
The kernel can probably implement this for their internal APIs pretty easily. The kernel already supports C11, so all the pieces are there. The bottleneck is in committer and reviewer time.
> Simply because it's hard to find about under 40 (let alone anyone under 30) who may even want to touch C if they have a choice.
I'm 31. I hope to continue using C for many decades. :)
> It's only “more useful” if you insist on zero-string abominations.
I do enjoy NUL-terminated strings, yes, which is why I find returning pointers more useful. The problems with strings, why they have been blamed for so long, wasn't really fault of strings themselves, but of the language, which wasn't as expressive as it could be. That's changing.
Posted Feb 11, 2025 23:45 UTC (Tue)
by khim (subscriber, #9252)
[Link] (7 responses)
You certainly would be able to do that, after all Cobol 2023 and Fortran 2023 both exist. The question is how many outside of “Enterprise” (where nothing is updated… ever, except when it breaks down and falls apart completely) would care. The problem with strings are precisely strings. It's not even the fact that NUL is forbidden to be used inside (after all Rust strings are guaranteed UTF-8 inside). The problem lies with the fact that something that should be easy and simple (just look in register and you know the length) is incredibly hard with null-terminated strings. It breaks speculations, requires special instructions (like what SSE4.2 added or “no fault vector load”, used by RISC-V), plays badly with many algorithms (why operation that shouldn't change anything in memory at all – like splitting string in two – ever changes anything?). Null-terminated strings are not quite up there with the billion dollar mistake but they are very solid contenders for the 2nd place. Try again. _Generic is C11 and tgmath is C99. Means were there for 12 or 24 years (depending on how you are counting), there was just no interest… till 100% guaranteed job-security stance that C would never be replaced (simply because all prospective “C killers” were either built around C core or were unable to support effective interop with C) was threatened by Rust and Swift. Then and only then wheels started moving… but I'm pretty sure they would pretty soon be clogged again… when it would be realized that on one side only legacy projects are interested in using C anyway and the other side the majority of legacy projects just don't care to change anything unless they are forced to do that. Yeah, but that's precisely the issue: while existing kernel developers may want to perform such change they are already already overworked and overstressed… and newcomers, normally, want nothing to do with C. I guess the fact that exceptions like you exist gives it a chance… but it would be interesting to see how it'll work. Kernel is one of the few projects that can actually pull that off.
Posted Feb 13, 2025 8:46 UTC (Thu)
by aragilar (subscriber, #122569)
[Link] (6 responses)
Posted Feb 13, 2025 9:59 UTC (Thu)
by taladar (subscriber, #68407)
[Link] (5 responses)
Posted Feb 13, 2025 10:28 UTC (Thu)
by khim (subscriber, #9252)
[Link] (4 responses)
How would backports hurt anyone? Sure, you can only use GCC 12 on RHEL 7, but that beast was released more than ten years ago, before first version of Rust, even! Sure, at some point backporting stops, but I don't think the hold ups are “enterprise distros” (at least not RHEL specifically): these, at least, provide some updated toolchains. GCC 12 was released in a year 2022, thus it's pretty modern, by C standards. “Community distros” don't bother, most of the time.
Posted Feb 14, 2025 14:26 UTC (Fri)
by taladar (subscriber, #68407)
[Link] (3 responses)
Posted Feb 14, 2025 14:29 UTC (Fri)
by khim (subscriber, #9252)
[Link] (2 responses)
Who even cares what they use for he development of the platform itself? Developers shouldn't even care about that, it's internal implementations detail.
Posted Feb 17, 2025 8:49 UTC (Mon)
by taladar (subscriber, #68407)
[Link] (1 responses)
Posted Feb 17, 2025 9:15 UTC (Mon)
by khim (subscriber, #9252)
[Link]
How is that relevant? Linux kernel was all too happy to adopt features not implemented by clang, and patches needed to support clang – and clang, at that point, was already used by Android, the most popular Linux distrubution used by billions… why RHEL should be treated differently? Let RHEL developers decide what to do with their kernel: they can create special kgcc package (like they already did years ago) or rework features in any way they like.
Posted Feb 12, 2025 6:14 UTC (Wed)
by interalia (subscriber, #26615)
[Link] (1 responses)
In theory the kernel could switch easily enough given review time as you say, but would doing this also require bumping the required compiler version for the kernel? If so I'm not sure if they would feel safe for doing so for quite a few years, and Rust would also advance in the meantime.
Posted Feb 12, 2025 8:41 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link]
Any compiler that supports C11 should be able to support these.
Here's an example of how to write such a const-generic API:
```
#define my_strchr(s, c) \
Posted Feb 12, 2025 11:17 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (19 responses)
Actually, I worked at a project that used counted strings (not terminated by a NUL, unless we needed to pass them to syscalls), and even there, functions returning a pointer were overwhelmingly more used than ones returning a count.
Consider the creation of a counted string:
```
Equivalent code that uses a count would be more complex (and thus more unsafe):
```
Posted Feb 12, 2025 12:04 UTC (Wed)
by excors (subscriber, #95769)
[Link] (17 responses)
sds s = sdsempty();
(and in the unlikely event that you're doing a lot of concatenation and really care about minimising malloc calls, you can add `s = sdsMakeRoomFor(s, sdslen(s1) + sdslen(s2) + sdslen(s3));` near the top). That makes it both simpler and safer than the original code. You should never be directly manipulating the length field.
(Of course in almost all other languages the equivalent code would be `s = s1 + s2 + s3;` which is even more simpler and safer.)
Posted Feb 12, 2025 12:40 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (16 responses)
I disagree with the last sentence. It was true in the past, without powerful static analyzers. Managed memory within APIs hides information to the compiler (and static analyzer), and thus provides less safety overall, provided that you have a language expressive enough and a static analyzer powerful enough to verify the program.
Consider the implementation of mempcpy(3) as a macro around memcpy(3) (or an equivalent inline function that provides the same information to the compiler):
#define mempcpy(dst, src, n) (memcpy(dst, src, n) + n)
A compiler (which knows that memcpy(3) returns the input pointer unmodified; this could be expressed for arbitrary APIs with an attribute in the future, but for now the compiler knows memcpy(3) magically) can trace all offsets being applied to the pointer 'p', and thus enforce array bounds statically. You don't need dynamic verification of the code.
With a managed string like you propose, you're effectively blinding the compiler from all of those operations. You're blindly handling the trust into the string library. If the library has a bug, you'll suffer it. But also, if you misuse the library, you'll have no help from the compiler.
Posted Feb 12, 2025 12:49 UTC (Wed)
by khim (subscriber, #9252)
[Link] (15 responses)
Why? What's the difference? If everything is truly “static enough” then managed string can be optimized away. That's not a theory, if you would look on Rust's example then temporary string is completely elided and removed from the generated code, C compiler (which is, essentially, the exact same compiler) should be able to do the same. So you would trust your ad-hoc code, but wouldn't trust widely tested and reviewed library. Haven't the history of Linux kernel fuzzing shown us that this approach simply doesn't work?
Posted Feb 12, 2025 13:04 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (14 responses)
I personally use NUL-terminated strings because they require less (almost none) ad-hoc code. I'm working on a hardened string library based on <string.h>, providing some higher-level abstractions that preclude the typical bugs.
<https://github.com/shadow-maint/shadow/tree/master/lib/st...>
> Why? What's the difference?
Complexity. Yes, you can write everything inline and let the compiler analyze it. But the smaller the APIs are, the less work you impose on the analyzer, and thus the more effective the analysis is (less false negatives and positives). You can't beat the simplicity of <string.h> in that regard.
Posted Feb 12, 2025 13:16 UTC (Wed)
by khim (subscriber, #9252)
[Link] (13 responses)
Nope. Things don't work like that. Smaller API may help human to manually optimize things, because humans are awfully bad at keeping track of hundreds and thousands of independent variables, but really good at finding non-trivial dependencies between few of them. Compiler optimizer is the exact opposite: it doesn't have smarts to glean all possible optimizations from a tiny, narrow, API, but it's extremely good at finding and eliminating redundant calculations in different pieces on thousands lines of code. Possibly. And if your goal is something extremely tiny (like code for a smallest possible microcontrollers) then this may be a good choice (people have successfully used Rust on microcontrollers, but usually without standard library since it's too bit for them). But using these for anything intended to be used on “big” CPUs with caches measured in megabytes? Why?
Posted Feb 12, 2025 13:27 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (12 responses)
I never cared about optimized code. I only care about correct code. C++ claims to be safer than C, among other things by providing (very-)high-level abstractions in the library. I think that's a fallacy.
There's a reason why -fanalyzer works reasonably well in C and not in C++. All of that complexity triggers many false positives and negatives. Not being able to run -fanalyzer in C++ makes it a less safe language, IMO.
The optimizer might be happy with abstractions, but the analyzer not so much. I care about the analyzer.
> But using these for anything intended to be used on “big” CPUs with caches measured in megabytes? Why?
Safety.
My string library has helped find and fix many classes of bugs (not just instances of bugs) from shadow-utils. It's a balance between not adding much complexity (not going too high-level), but going high enough that you get rid of the common classes of bugs, such as off-by-ones with strncpy(3), or passing an incorrect size to snprintf(3), with for example a macro that automagically calculates the size from the destination array.
You'd have a hard time introducing bugs with this library. Theoretically, it's still possible, but the library makes it quite difficult.
Posted Feb 12, 2025 13:53 UTC (Wed)
by khim (subscriber, #9252)
[Link] (2 responses)
Then why are you even using C and why do we have this discussion? No, it's not. The fact that we have complicated things like browsers implemented in C++ but nothing similar was ever implemented in C is proof enough of that. C++ may not be as efficient than C (especially if we care about size and memory consumption) but it's definitely safer. But if you don't care about efficiency then any memory safe language would do better! Even BASIC! Why do you care about analyzer if alternative is to use something that simply makes most things that analyzer can detect impossible. Or even something like WUFFS if you need extra assurances? But again: all these tricks are important if your goal is speed first, safety second. If you primary goal is safety then huge range of languages from Ada to Haskell and even Scheme would be safer. These are all examples of bugs that any memory-safe language simply wouldn't allow. C++ would allow it, of course, but that's because C++ was designed to be “as fast C but safer”… one may discuss about if it achieved it or not, but if you don't target “as fast C” bucket then there are bazillion languages that are safer.
Posted Feb 13, 2025 10:43 UTC (Thu)
by alx.manpages (subscriber, #145117)
[Link] (1 responses)
Posted Feb 13, 2025 19:05 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
No, you don't. No human can keep track of all of the C pitfalls in non-trivial code.
Even the most paranoid DJB code for qmail had root holes, and by today's standards it's not a large piece of software.
Posted Feb 14, 2025 23:30 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Yes, I agree. However, IIRC, it is because its main author (David Malcolm) is vastly more familiar with C than C++. Clang also has something like it in some of its `clang-tidy` checks, but I agree that GCC's definitely has a different set of things it covers, so they can coexist nicely.
Posted Feb 15, 2025 0:12 UTC (Sat)
by mb (subscriber, #50428)
[Link] (7 responses)
Why not use an interpreted language then?
>My string library has helped find and fix many classes of bugs ...
Sure. Thanks for that.
>but the library makes it quite difficult.
Modern languages make it about impossible.
Posted Feb 15, 2025 0:24 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link] (6 responses)
Because C is my "mother tongue" regarding computers. I can write it much better than other languages, just like I can speak Valencian better than other --possibly easier-- languages.
Posted Feb 15, 2025 0:51 UTC (Sat)
by mb (subscriber, #50428)
[Link] (5 responses)
That explains your "reasoning" indeed.
Posted Feb 15, 2025 22:29 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link] (4 responses)
Posted Feb 15, 2025 22:40 UTC (Sat)
by mb (subscriber, #50428)
[Link] (3 responses)
Posted Feb 15, 2025 23:05 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link] (2 responses)
Why did you put it in quotes? Were you implying that my reasoning is inferior than yours? Isn't that offensive? Please reconsider your language.
> And because "I always did it like this" isn't a reasoning that helps in discussions.
It is, IMO. I'm not a neurologist. Are you? I'm not a expert in how people learn languages and how learning secondary languages isn't as easy as learning a mother tongue. But it is common knowledge that one can speak much better their mother tongue than languages learned after it. It should be those that argue the opposite, who should justify.
Or should I take at face value that I learnt the wrong language, and that somehow learning a different one will magically make me write better *without regressions*? What if it doesn't? And why should I trust you?
Posted Feb 15, 2025 23:12 UTC (Sat)
by mb (subscriber, #50428)
[Link] (1 responses)
I will from now on block you here on LWN and anywhere else.
Posted Feb 15, 2025 23:31 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link]
Okay. You don't need to. Just asking me to not talk to you would work just fine. I won't, from now on. I won't block you, though.
Posted Feb 12, 2025 12:15 UTC (Wed)
by khim (subscriber, #9252)
[Link]
But why would you need all that complexity? If you work with strings a lot… wouldn't you have convenience methods? It Rust you would write something like this: Sure, NUL-terminated strings are a bed design from the beginning to the end, but also The only justification for that design is the need to produce something decent without optimizing compiler and in 16KB (or were they up to 128KB by then?) of RAM. Today you have more RAM in your subway ticket and optimizing compilers exist, why stick to all this manual manipulations where none are needed?
Posted Feb 11, 2025 20:13 UTC (Tue)
by roc (subscriber, #30627)
[Link] (17 responses)
And C has so many "rough edges". These aren't even the biggies. The complete lack of lifetime information in the type system, and the UB disaster, are much worse. Saying "well, Rust has rough edges too" and implying that that makes them kind of the same is misdirection.
Posted Feb 11, 2025 22:31 UTC (Tue)
by alx.manpages (subscriber, #145117)
[Link] (16 responses)
I suspect it's possible to add those guarantees in C with some new attribute that you could invent, or some other technique. There's an experimental compiler that did that (or so I heard). If someone adds such a feature to GCC or Clang, and proves that it makes C safer, I'm sure people will pick it up, and it will eventually be standardized.
Posted Feb 12, 2025 20:31 UTC (Wed)
by roc (subscriber, #30627)
[Link] (15 responses)
Posted Feb 12, 2025 20:53 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (14 responses)
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3423.pdf>
Some of the ideas might be adoptable in ISO C. The trade-offs, etc., I don't know them. I asked the author of the paper to propose standalone features that could be acceptable in ISO C, so that we can discuss them.
Who would adopt new C dialects that are safer? Programmers that want to keep writing C in the long term without having their programs replaced by rusty versions. I would.
Posted Feb 12, 2025 21:30 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (9 responses)
> TrapC memory management is automatic, cannot memory leak, with pointers lifetime-managed not garbage collected
Is impossible. You need to have something like a borrow-checker for that, and it requires heavy investment from the type system.
Without that, you're limited to region inference (like in Cyclone C), and it's not powerful enough for anything serious.
Posted Feb 17, 2025 18:02 UTC (Mon)
by anton (subscriber, #25547)
[Link] (8 responses)
The compiler would either prove that the code is safe or insert run-time checks, based on a sophisticated type system. I.e., one would get what rewriting in Rust gives, but one would need less effort, and could do it piecewise.
This work sounded promising, but there has not been the transfer from the research project into production. Instead, after the research project ended, even the results of the research project mostly vanished (many dead links). What I found is the Ivy package, Deputy and Heapsafe manual. But
Instead of adding such annotations to C, people started to rewrite stuff in Rust, which seems to be a more expensive proposition. My current guess is that it's a cultural thing: Many, too many C programmers think their code is correct, so there is no need to add annotations that may slow down the code. And those who think otherwise have not picked up the Ivy ideas, but instead switched to Rust when that was available.
Posted Feb 17, 2025 19:03 UTC (Mon)
by mbunkus (subscriber, #87248)
[Link] (2 responses)
Just look at most C/C++ code bases (other languages, too) & observe how many variables aren't marked "const" that easily could be. Or how many functions could be "static" but aren't. Or that the default is to copy pointers instead of moving them.
Rust has the huge advantage of having made the safe choices the default ones instead of the optional ones, and the compiler helps us remembering when we forget. In C/C++ all defaults are unsafe, and there's almost no help from the compiler.
Posted Feb 18, 2025 8:16 UTC (Tue)
by anton (subscriber, #25547)
[Link] (1 responses)
Posted Feb 18, 2025 16:11 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
Posted Feb 17, 2025 19:13 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Ivy uses a garbage collector.
C just offloads the safety proof to the developer. Rust is really the first language that tries to _assist_ users with proving the lifetime correctness.
What's worse, there is no real way to make it much more different from Rust. Any other attempt to implement the lifetime analysis will end up looking very similar. We already see that with SPARK in Ada: https://blog.adacore.com/using-pointers-in-spark
Posted Feb 17, 2025 20:11 UTC (Mon)
by daroc (editor, #160859)
[Link] (3 responses)
There's
one project I've had my eye on that essentially replaces garbage collection with incremental copying and linear references. It's definitely not ready for production use yet, and is arguably still a form of garbage collection even though there's no pauses or separate garbage collector, but it's an interesting approach. Then there's languages like Vale that are experimenting with Rust-like approaches but with much better ergonomics.
None of which means you're wrong — your options right now are basically garbage collection, Rust, or manual memory management — but I do feel hopeful that in the future we'll see another academic breakthrough that gives us some additional options.
Posted Feb 17, 2025 22:13 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
I really doubt that the status quo (borrow checker or GC) is going to change. We probably will get more powerful primitives compatible with the borrow checker, though.
Posted Feb 17, 2025 22:48 UTC (Mon)
by daroc (editor, #160859)
[Link] (1 responses)
Posted Feb 18, 2025 8:40 UTC (Tue)
by anton (subscriber, #25547)
[Link]
Concerning porting from malloc()/free() to something that guarantees no dangling pointers, no double free() and maybe no leaking: The programmer who uses free() uses some reasoning why the usage in the program is correct. If the new memory-management paradigm allows expressing that reasoning, it should not be hard to port from malloc()/free() to the new memory-management paradigm. One problem here is that not free()ing malloc()ed memory is sometimes a bug (leakage) and sometimes fine; one can mark such allocations, but when the difference is only clear in usage of functions far away from the malloc(), that's hard.
Posted Feb 13, 2025 4:32 UTC (Thu)
by roc (subscriber, #30627)
[Link]
All that paper says about the TrapC compiler that it is "in development".
That document makes the extraordinary claim that "TrapC memory management is automatic, cannot memory leak, with pointers lifetime-managed not garbage collected". It nowhere explains how this is done, not even by example. Strange for such an extraordinary and important achievement.
I can see why C advocates want to believe that a memory-safe extension of C is just around the corner. I'll believe it when I see it.
Posted Feb 14, 2025 23:28 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Feb 14, 2025 23:39 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
There _are_ attempts to do that with C. I know of this one: https://github.com/pizlonator/llvm-project-deluge/blob/de...
Posted Feb 15, 2025 22:22 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link]
Posted Feb 11, 2025 20:15 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
That's a lot of "might" for an unknowable future. That's a bad gamble. And its predicated upon this irrational steady state/ zero sum idea that well, if Rust is better in some ways than C that just means it's worse in other ways. Not so. Rust isn't _perfect_ but that doesn't preclude being better. Seven needn't be the largest possible number in order to be a bigger number than three, Rust can be a better choice than C while BOTH of these propositions remain true: C was a good idea in the 1970s (fifty years ago!); Rust is not perfect and will itself be replaced in time.
Posted Feb 11, 2025 22:47 UTC (Tue)
by alx.manpages (subscriber, #145117)
[Link]
There are features having been accepted into C2y a few months ago, which will be available (most likely) in GCC 16. For example, there's the countof() operator.
<https://thephd.dev/the-big-array-size-survey-for-c-results>
My patch was actually ready around November, but I'm holding it due to lack of agreement in the name of the operator.
That's one of the features that will bring some safety to the language soon. And there are some obvious extensions that will make it even better. Consider this code:
```
That is, being able to reference the number of elements of an array even if it's an array parameter (thus, a pointer). That's not yet standardized, but we're working on getting this into GCC soon. There are several other features that will similarly make the language safer.
Features will arrive to GCC soon, even if there's not a release of ISO C in the near future.
> It might then be implemented across major compilers like GCC
The author of the proposal has already a working implementation in Clang. It's not yet in mainline Clang, but I don't expect that it would take much time to mainline it once it's been accepted into C2y. It might be way sooner than you expect.
> Rust isn't _perfect_ but that doesn't preclude being better.
I don't say it's not better. But most changes in the FOSS community are merged very slowly, precisely to prove that they are good. Pressing Rust is the opposite of that. Maybe Rust proves to be as good as it sounds, but only time will say it. Every change is suspect of being worse, until it is proven beyond reasonable doubt that it isn't.
Posted Feb 11, 2025 22:42 UTC (Tue)
by ojeda (subscriber, #143370)
[Link]
What I was trying to showcase is that, with the same amount of syntax/effort here, Rust gives us extra benefits that C does not.
In the slides I placed a grid of 4 combinations: (correct, incorrect) on one axis, (safe, unsafe) on the other. (I recommend opening the slides to see it).
The idea is that those are orthogonal -- all 4 combinations are possible. So, for instance, a perfectly correct C function that takes a pointer and dereferences it, will still always be considered "unsafe" in Rust terms.
In the "safe quandrants", we know the functions and their code are "safe" just by looking at them -- there is no need to look at other code or their callers. This is a "property" of the "source code" of those functions -- it is not a property of the binary, or something that requires whole-system analysis to know.
And knowing that is already valuable, since as implementors we know we will not introduce UB from within the function. And, as callers, that we will not introduce UB by just calling it.
There are caveats to that, of course (e.g. if we already had UB elsewhere, we can fabricate invalid inputs), but it is a very powerful distinction. For instance, if we copy-paste those two functions (i.e. even the incorrect one) into a safe program, even replacing an existing correct function, we shouldn't be able to introduce UB.
And this helps across time, too. In C, even if today you have a perfect C program, it is very hard to make a change that keeps it perfect, even just in terms of not triggering UB.
I hope that clarifies a bit. The "explanation" above is of course very informal and hand-wavy -- the idea in the talk was not to explain it in detail, but rather it was meant to be revealing for C developers, since it shows a "difference" that "is not there" (after all, the binaries end up being the same, no?), i.e. it tries to hint at what the concepts of "safe function" and "safe code" are about and get C programmers to think "hmm... that sounds interesting, I will look it up".
It can still crash
It can still crash
* A C++ reference points at the same thing for its entire lifetime - it cannot be changed to point at something else (like a const pointer, unlike a pointer-to-const).
* For that matter, there is no syntax to distinguish the reference from the pointee - it auto-dereferences where appropriate.
* For some reason, much of the documentation insists on avoiding the word "pointing" in relation to references, instead claiming that a reference is an "additional name" or "alias" for an object. But they are pointers, or at least there is no practical implementation other than as a pointer (on most reasonable architectures, in the general case, excluding cases where the optimizer manages to elide a whole object, etc.).
It can still crash
It can still crash
It can still crash
It can still crash
It can still crash
There are endless such possibilities.
That has nothing to do with "unsafe", though.
Ada does this too
Ada does this too
It's quite common that the type system is used for logical correctness, too.
This type has nothing to do with safety, but it makes working with paths much less error prone than manually poking with strings. And it makes APIs better by clearly requiring a Path type instead of a random string.
If you do operations with such types the errors can't accidentally be ignored (and then lead to runtime crashes).
For example the compiler won't let you ignore the fact that paths can't always be converted to UTF-8 strings. It forces you to handle that logic error, unless you explicitly say in your code to crash the program if the string is not UTF-8.
Ada does this too
Ada does this too
2. You can't get an instance if the precondition fails to hold.
3. You can't invalidate the precondition while holding an instance (but you may be able to invalidate it and give up the instance in one operation). If you are allowed to obtain multiple instances that pertain to exactly the same precondition, then this rule is upheld with respect to all of those instances.
4. Functions can require you to have an instance of the type in order to call them.
5. The type is zero-size, or is a zero-cost wrapper around some other type (usually whatever object the precondition is all about), so that the compiler can completely remove it once your program has type-checked.
* An "affine" type may be used at most once (per instance of the type). In Rust, any type that does not implement Copy is affine when moved. Borrowing is not described by this formalism, but by happy coincidence, borrows simplify quite a few patterns that would otherwise require a fair amount of boilerplate to express.
*A "linear" type must be used exactly once. Linear types don't exist in Rust or most other languages, but Haskell is experimenting with them. Linear types enable a few typestate patterns that are difficult to express in terms of affine types (mostly of the form "invalidate some invariant, do some computation, then restore the invariant, and make sure we don't forget to restore it"). To some extent, this sort of limitation can be worked around with an API similar to Rust's std::thread::scope, but it would be annoying if you had to nest everything inside of closures all the time.
* "Ordered" types must be used exactly once each, in order of declaration. "Relevant" types must be used at least once each. These pretty much do not exist at all, at least as far as I can tell, but there is theory that explains how they would work if you wanted to implement them.
> there are a bunch of small details that need to be worked out before this can be done
Ada does this too
#[repr(transparent)]
struct Linear<T>(T);
impl<T> Drop for Linear<T> {
fn drop(&mut self) {
const { assert!(false) };
}
}
impl<T> Linear<T> {
fn unbox(self) -> T {
// SAFETY: type Linear is #[repr(transparent)]
let t: T = unsafe { transmute_copy(&self) };
_ = ManuallyDrop::new(self);
t
}
}
panic!
– not everyone likes to use -C panic=abort
. And without -C panic=abort
compiler would complain about any code that may potentially even touch panic!
Ada does this too
Ada does this too
> It does not promise that an unreachable const block is never evaluated (in the general case, that would require solving the halting problem).
Ada does this too
ManualDrop
and maybe some unsafe
.Ada does this too
Ada does this too
replace
and put it into your crate without telling the compiler that it needs to do some special dance then everything works.unsafe
code and standard library – which would, essentially, turn it into a different language.static_assert
-based linear types work are not two different kinds of work, but, in fact, exactly the same work.Ada does this too
// The panic may or may not occur when the program is built.
const { panic!(); }
}
> Currently, the reference[1] says that this is allowed to fail at compile time:
Ada does this too
const
evaluation happens after monomorphisation, not before. It's exactly like C++ templates and should work in similar fashion and require similar resources. Templates in C++ were handled decently by EDG 30 years ago and computers were much slower back then.static_assert
in linear type destructor is happening when they shouldn't happen it's matter to tightening the specifications and implementations, not question of doing lots of math and theorem-proving.Ada does this too
> We do not want to recreate the C++ situation where all the types match, but then something blows up in monomorphization.
Ada does this too
for
. And said lending iterator was a showcase of GATs four years ago.Ada does this too
[2]: https://doc.rust-lang.org/src/core/iter/traits/iterator.r...
Ada does this too
2. Values are dropped on panic, if the compiler can't prove your code won't panic then it will have to generate the drop glue.
Ada does this too
* I'm not sure that strongly-undroppable types provide much of a useful improvement over weakly-undroppable types, and it would be absurd to provide both features at once. But it could be argued that having one exceptional case where drop glue is invoked is a recipe for bugs, so it might be a cleaner implementation. OTOH, if you have chosen to have unwinding panics, you probably don't want them to magically transform into aborts just because some library in your project decided to use an undroppable somewhere. I currently think that weakly-undroppable is the more pragmatic choice, but I think there are valid arguments to the contrary.
* Unleakable types would allow the original API for std::thread::scope to be sound, and would probably enable some other specialized typestates. They're also pretty invasive, although probably not quite as much as undroppable types. They would not solve the File::close() problem.
* Strongly-linear types are just the combination of two of the above features. If either feature is too invasive to be practical, then so are strongly-linear types. But they would provide a strong guarantee that every instance is cleaned up in one and only one way.
std::thread::spawn(move || _t = t; loop{std::thread::park();});
}
Errors on close
For example, the std::fs::File type currently does not have a close() method, because the type already closes itself on drop. But that means that it swallows errors when it is closed. The documentation recommends calling sync_all() (which is equivalent to fsync(2)) if you care about errors, but I imagine that some filesystem developers would have choice words about doing that in lieu of checking the error code from close(2). If File were weakly-undroppable, then it could provide a close() method that returns errors (e.g. as Result<()>) and fails the compilation if you forget to call it
Errors on close
* Never call fsync and never check the close return code. Significant probability of silent data loss, but faster.
* Never call fsync, but do check the close return code. Presumably a lower probability of silent data loss, since you might catch some I/O errors, but almost as fast as not checking the error code (in the common case where there are no errors). In the worst case, this is a main memory load (cache miss) for the errno thread-local, followed by comparing with an immediate.
Errors on close
* Never call fsync and never check the close return code. Significant probability of silent data loss, but faster.
* Never call fsync, but do check the close return code. Presumably a lower probability of silent data loss, since you might catch some I/O errors, but almost as fast as not checking the error code (in the common case where there are no errors). In the worst case, this is a main memory load (cache miss) for the errno thread-local, followed by comparing with an immediate.
Errors on close
Errors on close
Wol
But, by the nature of Linux's close syscall, error on close means one of three things:
Errors on close
Ada does this too
Ada does this too
Wol
It's not the runtime of the checker that's the problem; it's that we cannot write a checker that definitely accepts or rejects all reasonable programs. The underlying problem is that, thanks to Rice's Theorem (a generalisation of Turing's Halting Problem), a checker or combination of checkers can, at best, give you one of "undecided", "property proven to hold", or "property proven to not hold". There's two get-outs we use to make this tractable:
Running several checkers instead of one
Running several checkers instead of one
Wol
Running several checkers instead of one
Running several checkers instead of one
Idris has both as part of its totality checker; a function can be partial (in which case it may never terminate or produce a value - it can crash or loop forever) or total (in which case it must either terminate for all possible inputs, or produce a prefix of a possibly infinite result for all possible inputs).
Uses of termination checking
If you run all the currently known termination checker algorithms that actually come up with useful results (with the exception of "run until a timeout is hit, say it might not terminate if the timeout is hit, or it does terminate if the timeout is not hit"), you're looking at a few seconds at most. The pain is not the time that the algorithms we know of take, it's the fact that most of them return "undecided" on programs that humans can tell will terminate.
Running several checkers instead of one
Ada does this too
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
_Optional
leads nowhere one only needs to look on the fate of const
. It was added when C was relatively young, and it was kinda-sorta adopted… but not really.const
and thus it have to work both with mutable and immutable strings (there was difference between them when it was introduced)… so the best they could do is to add this blatantconst
safety violation. And everyone remembers the fate of noalias, isn't it?find
(both in C++ and in Rust) returns position.nullability annotations in C
> immutable strings… yet returns pointer to a mutable
> string! WTH? What's the point? What kind of safety it is?
> of trying to decide whether the result is mutable or
> immutable string find (both in C++ and in Rust) returns
> position.
> You may be happy that C23 changed that.
nullability annotations in C
const
or noalias
.nullability annotations in C
> I'm 31. I hope to continue using C for many decades. :)
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
alx@devuan:~/tmp$ cat strchr.c
const char *my_const_strchr(const char *s, int c);
char *my_nonconst_strchr(char *s, int c);
( \
_Generic(s, \
char *: my_nonconst_strchr, \
void *: my_nonconst_strchr, \
const char *: my_const_strchr, \
const void *: my_const_strchr \
)(s, c) \
)
alx@devuan:~/tmp$ gcc -Wall -Wextra -pedantic -S -std=c11 strchr.c
alx@devuan:~/tmp$
```
nullability annotations in C
>
> No, it's not. It's only “more useful” if you insist on zero-string abominations.
> If your strings are proper slices (or standalone strings on the heap…
> only C conflates them, C++, Rust and even such languages as Java and C# have separate types)
> then returning pointer is not useful.
> You either need to have generic type that returns slice or return index.
> And returning index is more flexible.
s.str = malloc(s1.len + s2.len + s3.len);
p = s.str;
p = mempcpy(p, s1.str, s1.len);
p = mempcpy(p, s2.str, s2.len);
p = mempcpy(p, s3.str, s3.len);
s.len = p - s.str;
```
s.str = malloc(s1.len + s2.len + s3.len);
s.len = 0;
s.len += foo(s.str + s.len, s1.str, s1.len);
s.len += foo(s.str + s.len, s2.str, s2.len);
s.len += foo(s.str + s.len, s3.str, s3.len);
```
nullability annotations in C
s = sdscatsds(s, s1);
s = sdscatsds(s, s2);
s = sdscatsds(s, s3);
sdsfree(s);
nullability annotations in C
> With a managed string like you propose, you're effectively blinding the compiler from all of those operations.
nullability annotations in C
nullability annotations in C
> But the smaller the APIs are, the less work you impose on the analyzer, and thus the more effective the analysis is (less false negatives and positives).
nullability annotations in C
nullability annotations in C
> I never cared about optimized code.
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
Modern languages do that for free, though.
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
And because "I always did it like this" isn't a reasoning that helps in discussions.
nullability annotations in C
nullability annotations in C
>Please reconsider your language.
nullability annotations in C
nullability annotations in C
[str1, str2, str3].concat().into_boxed_str()
And that's it. In C-like language that doesn't use “dot” to chain functions it would be something like:
string_to_frozen_string(concat_strings(str1, str2, str2))
Or, maybe, even just
concat_strings(str1, str2, str2)
string.h
interface is awful, as whole.nullability annotations in C
nullability annotations in C
> through this pointer" when what everyone wants most of
> the time is "nothing modifies the referent while I'm
> holding this pointer", i.e. what Rust gives you.
nullability annotations in C
nullability annotations in C
nullability annotations in C
There was a research project at Berkeley (George Necula et al.) across several years (including 2006), apparently called Ivy (although early presentations did not use that name). The idea was that existing C code could be made safe piecewise (which requires sticking with the ABI among other things, unlike C implementations with fat pointers) by adding annotations in some places.
nullability annotations in C
nullability annotations in C
My understanding (from hearing a talk by George Necula) is that the Ivy tools would complain if they do not know anything about array bounds. And that you can turn off such complaints for parts of the source code that you have not enhanced with annotations yet, like Rust's nullability annotations in C
unsafe
.
Note that Rust's unsafe does not turn off complaints; it gives you access to abilities that you can use unsoundly, but just adding unsafe to code that Rust rejects will not normally cause it to accept it.
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
Porting from malloc()/free() to garbage collection is easy: just delete the calls to free() (or define them as noops). There is one pathological case for conservative garbage collectors (a linked list that grows at the end where you move the root pointer along the list; any spurious pointer to some element will cause the list to leak), but it's a rare idiom.
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
<https://thephd.dev/the-big-array-size-survey-for-c>
wchar_t *
wmemset(size_t n;
wchar_t wcs[n], wchar_t wc, size_t n)
{
for (size_t i = 0; i < _Countof(wcs); i++)
wcs[i] = wc;
}
```
It can still crash