UB in Rust vs C

Posted Aug 14, 2024 20:58 UTC (Wed) by mb (subscriber, #50428)
In reply to: UB in Rust vs C by ralfj
Parent article: Standards for use of unsafe Rust in the kernel

I think for most UBs in Rust there is no sane way a compiler could implement it.

UB in Rust is things like for example:
If you construct a reference from a Null-pointer, it is insta-UB. You don't even have to dereference it.

The only way a compiler could "support" this is to add a runtime check to every conversion and then panic. So effectively don't actually do the reference construction in the Null case. But that would be obviously bad for performance.

Rust tries really hard to mostly only make things UB that can't have a sane behavior.

UB in Rust vs C

Posted Aug 14, 2024 21:56 UTC (Wed) by khim (subscriber, #9252) [Link] (40 responses)

The devil is in details.

> If you construct a reference from a Null-pointer, it is insta-UB.

Yes, but why?

> You don't even have to dereference it.

But if you never dereference it then what harm could it do — would ask “we code for the hardware” guy. You may ask that it's needed to ensure that Option<&T> can be of the same size as &T, but the s/he would ask “what about references that are not ever used as Option<&T>?” and so on.

The truth is that list of UBs is, to some degree, always arbitrary. Rust tries not to include “really stupid” UBs. Like: have you even imagined that C++ has UBs (soon to be removed) in it's lexer! I mean: how deranged could you be to proclaim “if our compiler has difficulty breaking a source file into tokens, then it has carte blanche to produce whatever output it wants” ?

But still, not matter how hard you try it's always possible to invent some crazy construct that would violate UBs for seemingly good reasons (e.g. it's UB in Rust to use uninitialized memory for fun and profit even if hardware have no objections).

That means that without both sides agreeing to act in a good faith nothing could be achieved. Language developers have try to invent sane list of UBs that would be useful for writing real-world programs, but language users have to avoid UBs even in places where code with UB would be more efficient and even if they don't like these UBs!

Rust tries to keep the bargain “fair”, but many C/C++ developers feel that the right to violate that bargain is their unalienable right!

UB in Rust vs C

Posted Aug 14, 2024 22:18 UTC (Wed) by mb (subscriber, #50428) [Link] (35 responses)

>> If you construct a reference from a Null-pointer, it is insta-UB.
>Yes, but why?

It's not possible to encode such references in a sane way.
Yes, you can demand that the compiler would support it nevertheless. But then you have to pay for it.

Languages without UB exist and they are all slow for a reason.

>many C/C++ developers feel that the right to violate that bargain is their unalienable right!

Which I can actually understand and relate to, given the sheer amount of UB in these languages.
It's easy that the personal preference feels like the correct and obvious implementation to some UBs and then demand the compiler to read my mind.

WUFFS

Posted Aug 15, 2024 9:33 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (34 responses)

> Languages without UB exist and they are all slow for a reason.

No. WUFFS doesn't have any UB and it is, to quote its own description "ridiculously fast".

WUFFS pays a high price for this, but it's a price you might be able to easily afford for some projects and if so WUFFS is a no brainer. The price is generality, WUFFS is not a general purpose language, most software could not be written in WUFFS, it is only for Wrangling Untrusted File Formats Safely.

Not being a general purpose language allows WUFFS to sidestep lots of CS gotchas, which aren't really important in themselves, but signal that you'll also find it was able to solve the real nasty engineering problems those CS gotchas assure you a real general purpose language could never solve. For example, WUFFS doesn't have bounds misses.

Not "WUFFS bolts on runtime bounds checks" (as safe Rust does and many C++ proposals do) but "WUFFS doesn't have bounds misses". It's simple we "just" check at compile time that all values used for indexing into a data structure are mathematically guaranteed within the bounds and so they can't miss.

This means writing WUFFS requires a lot more mathematical rigour than you've probably ever used in programming at least since a University formal methods course, but hey, it is very fast and entirely safe.

Remembering that WUFFS exists and that other languages of this class (specialised languages suitable only for a particular purpose e.g. crunching big data sets) could be made is much more optimistic and shows us what our future might look like, even though obviously the Linux kernel couldn't be written in WUFFS.

WUFFS and the Linux kernel

Posted Aug 15, 2024 10:13 UTC (Thu) by farnz (subscriber, #17727) [Link] (4 responses)

Remembering that WUFFS exists and that other languages of this class (specialised languages suitable only for a particular purpose e.g. crunching big data sets) could be made is much more optimistic and shows us what our future might look like, even though obviously the Linux kernel couldn't be written in WUFFS

Even though you couldn't write the entire Linux kernel in WUFFS, you could design restricted languages for interesting subsets of the Linux kernel. For example, it should be possible to come up with a restricted language for handling on-disk filesystem layout, such that the resulting code will either successfully handle whatever bunch of bytes you give it, or error out nicely because the filesystem is corrupt. It should also be possible to come up with a language that validates all syscall arguments (including following userspace pointers), to protect against weird acts on the part of userspace.

Part of the point of languages like WUFFS is to change the way you develop software that handles potentially dangerous inputs; instead of trying to validate on the fly, you have a layer that maps "all possible inputs" into "either error, or expected input state". And this is valuable because humans are really bad at thinking about "all possible inputs" (there's around 2**44 possible states for my SSD's exposed storage areas, for example, and I doubt that anyone has carefully thought through how xfs will behave for every single one of those 2**44 states supplied to it as "valid XFS image, please mount"). By having the machine say "you haven't come up with an answer for the meaning of these 2**30 states, because they're neither missing a signature, nor valid", you encourage the programmer to think this through fully.

WUFFS and the Linux kernel

Posted Aug 31, 2024 15:26 UTC (Sat) by sammythesnake (guest, #17693) [Link] (1 responses)

> there's around 2**44 possible states for my SSD's exposed storage areas

You have a 44 bit SSD? [scratches head]

Did you perhaps mean something like 2**2**44 (which by my quick stab at calculating corresponds to a ~2TB drive)

WUFFS and the Linux kernel

Posted Aug 31, 2024 17:18 UTC (Sat) by farnz (subscriber, #17727) [Link]

I did indeed mean 2**(2**44), since I have a 2 TiB exposed capacity drive (no idea how much raw capacity there is) - it's a huge number of possible states. And if I understated the number, that just makes it worse - the underlying issue that WUFFS and friends aim to help you with is reducing the number of input states from "all possible bit patterns" to "valid bit patterns", by forcing you to provide functions from "possible bit pattern" to "valid bit pattern or error state".

WUFFS and the Linux kernel

Posted Aug 31, 2024 17:30 UTC (Sat) by pizza (subscriber, #46) [Link] (1 responses)

> Part of the point of languages like WUFFS is to change the way you develop software that handles potentially dangerous inputs; instead of trying to validate on the fly,

For filesystems, you still have to validate on the fly. Because there are constructs that are valid in isolation but conflict with each other. You have to validate (and maintain) the _entire_ state holistically, and that is not practical when your metadata alone can easily exceed your memory size.

Valid structures and bugs

Posted Sep 1, 2024 9:26 UTC (Sun) by farnz (subscriber, #17727) [Link]

That's part of what I mean by "change the way you think"; if you've designed your filesystem such that constructs that are valid in isolation result in kernel bugs when combined, you have a problem. If you merely have a situation where the kernel remains bug-free, but there's data loss on a corrupt filesystem due to the conflicting structures (e.g. two inodes sharing an extent record means that either inode can modify that extent record), then you don't have a problem.

And part of this is distinguishing severities of bug; a filesystem bug that gives me total control of your machine because a consequence of an impossible construct being present is that the kernel jumps to an attacker-controller address is a different type of bug to one where the filesystem, when faced with a corrupted filesystem image, corrupts it further.

WUFFS

Posted Aug 15, 2024 18:41 UTC (Thu) by khim (subscriber, #9252) [Link] (28 responses)

> No. WUFFS doesn't have any UB and it is, to quote its own description "ridiculously fast".

Today people like to put Rust (without unsafe)/WUFFS and C# (also without unsafe)/Java (without JNI modules) into the same group, but I think we need to distinguish them, somehow, because they have, in some sense, radically different approach to UB and, more importantly, to safety.

And, IMNSHO this whole story with C#/Java craze was gigantic waste of time and resources.

In some sense C# and Java tell the developer: don't think about safety, we got it covered, unsafety in the managed code just doesn't exist, while Ada, Rust and WUFFS tell the exact opposite story: hey, you have to think about safety all the time, when you write each line of code, but don't worry, we would double-check your reasoning about why your code is safe and if we couldn't understand you reasoning compiler would tell you.

That dichotomy was noticed ages ago by Tony Hoare and was distilled into his famous phrase saying: there are two ways to write code — write code so simple there are obviously no bugs in it, or write code so complex that there are no obvious bugs in it.

The majority of today's safe languages belong to the write code so complex that there are no obvious bugs in it and only care about memory safety and absence of UB (and usually achieve both via the use of heavy runtime system). And these languages are all slow for a reason. I always felt that this reason is use of tracing GC but could never imagine how alternative can look like — till Rust made it accessible to the masses (Ada was always trying to do what Rust did, but was unable to invent working solution till Rust gave it and then, of course, Ada stole it and thus closed the loophole that existed for decades).

And when I looked from where it arrived in Rust I was surprised to find out that it was lifted from the tracing-GC based language! The trajectory was, essentially: start with tracing-GC language, go after write code so simple there are obviously no bugs in it property, add affine type system to make code safer, realize that GC is now superfluous and ditch it entirely.

This have proven what I actually felt intuitively all along (tracing GC is bad solution to safety issues) but also have shown me that problem is not tracing GC per see, problem is an attempt to solve safety problems by moving them into a managed runtime (which then needs GC to avoid UB).

Ultimately only human may handle safety and unsafety (Java tutorials that I saw usually include example of how stack implementation needs to turn “erased” elements into nullptr or else GC couldn't remove them), and if compiler can verify that solution that human invents then the end result is both safe and faster than an attempt to ensure safety via the use of managed runtime (which doesn't even work!).

WUFFS is an example of next step after Rust, is some sense: it's not general-purpose solution but when it can be used it's both safer and faster.

Maybe 10 or 20 years from now Rust replacement would be made and it would adopt these ideas, but for now I think it's enough to establish that fact that language that allow you to be crazy but puts on the straitjacket with sleeves tied at the back and places you into a room with padded walls and language that tells you “what you wrote makes no sense, go and redo your code” are radically different approaches to “safe” language without UB.

WUFFS

Posted Aug 16, 2024 10:42 UTC (Fri) by paulj (subscriber, #341) [Link] (6 responses)

Isn't it the case that more Rust programmes that have to work with more complex data-models and/or deal with long-lived state end-up relying heavily on RC/ARC?

Is runtime reference counted GC better than tracing GC? (I prefer ref counting myself, but I don't think I could claim RC is always better than tracing GC).

WUFFS

Posted Aug 16, 2024 12:42 UTC (Fri) by khim (subscriber, #9252) [Link]

> Is runtime reference counted GC better than tracing GC?

You would need to define what do you mean when you are saying “better” if you want to have a meaningful answer to that quesion. So far I know one example of task where tracing GC looks like a better fit: theorem provers. First of all you, usually, have no idea, in advance, if said theorem can even be proven or not (and that means that “spurious rejectings” can be tolerated which immediately makes situation slightly unusual) and you genuinely don't know which data is still useful and which one is garbage.

But over years I only saw that one example where tracing GC is clearly the superior solution. Most of the time tracing GC is not just useless, but it's actively harmful if you goal is the Hoare property.

> Isn't it the case that more Rust programmes that have to work with more complex data-models and/or deal with long-lived state end-up relying heavily on RC/ARC?

Yes, but there's an interesting Rust-specific side to that story: Rust doesn't have fully automatic reference counting! Or, rather, Rust does half of it (it automatically decrements counter when object is no longer needed and deallocates it when counter reaches zero), but it doesn't do automatic increment for you. This works very nicely because Rust passes objects around by moving then, not by copying them, which means that you usually need to do an explicit counter increment (by explicitly calling clone and obtaining copy of you refcounter-pointer) only in places where you need shared ownership.

This, again, leaves breadcrumbs in your code that help you to understand where do you split ownership and, again, helps to achieve the Hoare property.

WUFFS

Posted Aug 16, 2024 12:54 UTC (Fri) by excors (subscriber, #95769) [Link] (4 responses)

> Is runtime reference counted GC better than tracing GC?

Depends what you mean by "better". I think the main differences are that tracing GC typically has better long-term-average performance and can handle cyclic data structures, while refcounting has more predictable performance and can have useful destructors.

The average performance is a bigger deal in languages like Python or Java where almost everything is heap-allocated and GCed, even objects that only ever have a single owner, so a lot of time would be spent needlessly updating refcounts (especially with multithreading where they need to be atomic updates). It should matter less in Rust where the programmer can choose to use Rc only for the relatively rare objects with non-trivial ownership, and the more expensive atomic Arc for the even rarer ones that are shared between threads (and Rust guarantees you'll never accidentally share a non-atomic Rc between threads). That does require some extra thought from the programmer, though.

It also takes some extra thought to avoid cyclic data structures (maybe using weak references etc), or to explicitly break cycles when destroying the container, to avoid memory leaks. (Python solves it by using reference counting plus a tracing GC to find cycles, but then you get most of the downsides of both.)

Tracing GC has much less predictable performance because it might pause all your threads, or at least eat up a lot of CPU time, whenever it arbitrarily feels like it. For batch processing, that doesn't matter. For moderately latency-sensitive tasks (like web services), you might end up spending significant effort tuning the GC heuristics to minimise pauses. For real-time, no.

It's pretty nice having destructors that are guaranteed to run immediately when the last reference to an object is dropped (which tracing GC can't do) - you can use RAII for objects that hold non-memory resources (like file handles) that the GC doesn't know about, without worrying that you'll suffer resource exhaustion while the GC thinks you've still got plenty of memory and there's no need to collect any garbage. Particularly useful for a systems programming language since you're often working with resources provided by the system, outside of the GC's scope.

The tradeoffs chosen by Rust wouldn't work for every language; but for a systems programming language aimed at relatively sophisticated programmers, I think it has made a good choice.

WUFFS

Posted Aug 16, 2024 13:07 UTC (Fri) by mb (subscriber, #50428) [Link] (3 responses)

>I think it has made a good choice.

Yeah, it's ok-ish most of the time.

Except for the case of move-capturing blocks. Passing an Arc clone to multiple async-move blocks is awkward.
It basically means cloning the Arc to a different name and then moving that name into the async block.
Which typically results in names such as foo2 or foo_cloned.
(Name shadowing is also possible in some cases.)

I think this is on the current list of things to be improved. But I'm not sure what an elegant solution would look like.

WUFFS

Posted Aug 16, 2024 19:12 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

I think that Claim[1] looks ok. It makes a different cut to help understand "cheap to Clone" because `Rc::clone` is (almost certainly) cheaper than `[u8; 16536]` despite the latter being `Copy`.

[1] https://smallcultfollowing.com/babysteps/blog/2024/06/21/...

WUFFS

Posted Aug 16, 2024 20:49 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

I actually dislike that proposal. Because it's conceptually wrong. It asserts that the only difference between copy and clone is efficiency. But in reality difference is on another level: Copy types are just chunks of memory without internal structure and their copy is entirely independent from the original while clone means something else.

To me, on the congnitive level, clone of Rc<Vec<u32>> means that we want to split ownership of that [unmodifiable] vector… which maybe is kinda-sorta-Ok (because it's fixed and unmodifiable), but I would definitely not want to see Rc<RefCell<Vec<u32>>> automatically clones and, if understand correctly, this proposal would make it automatically clone'able, too.

The whole thing is solving entirely wrong problem: if the desire is to replace this:

tokio::spawn({
    let io = cx.io.clone():
    let disk = cx.disk.clone():
    let health_check = cx.health_check.clone():
    async move {
        do_something(io, disk, health_check)
    }
})

with this:

tokio::spawn(async move {
    do_something(cx.io, cx.disk, cx.health_check)
})

then it can be fixed with addition of just one keyword (it may even not be a keyword since it always comes after async):

tokio::spawn(async clone {
    do_something(cx.io, cx.disk, cx.health_check)
})

This would make it possible for a people like me to enable a clippy lint which rejects such code (I very much do want to see when I'm sharing ownership with some other task), although I can see why some people would want to add some magic to their programs.

Thankfully this proposal also comes with #![deny(automatic_claims)] which is good enough for me: while complexity of that whole madness is a bit larger than I would like with existence of #![deny(automatic_claims)] I don't care all that much about what other people are doing about Claim.

IOW: IMNSHO this proposal is a net negative, but very mild one, not enough to throw a temper tantrum if it would be accepted.

WUFFS

Posted Aug 16, 2024 21:28 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> then it can be fixed with addition of just one keyword (it may even not be a keyword since it always comes after async):
>
> tokio::spawn(async clone {
> do_something(cx.io, cx.disk, cx.health_check)
> })

Except I want it for plain closures too, not just `async` blocks.

Java/C# vs Rust

Posted Aug 16, 2024 12:52 UTC (Fri) by joib (subscriber, #8541) [Link] (20 responses)

> And, IMNSHO this whole story with C#/Java craze was gigantic waste of time and resources.

Well, Java was two decades before Rust, and allowed developers to write code that was safer from many common errors that plague C and C++ codebases. Yes, Java enterprise culture created a lot of ridiculousness (AbstractIntegerAdditionFactoryBean memes etc.), but still, had all the zillions of lines written in Java/C# instead been written in C/C++, the CVE database might have imploded into a black hole.

Would it have been better if Rust had been available in 1995? Yes, but it wasn't. Well, Haskell was around then and roughly as popular as today? ;)

Java/C# vs Rust

Posted Aug 16, 2024 13:29 UTC (Fri) by khim (subscriber, #9252) [Link]

> Well, Java was two decades before Rust,

Yes

> and allowed developers to write code that was safer from many common errors that plague C and C++ codebases.

No. Developers had plenty of options before Java and they still have plenty of options for these apps.

> Yes, Java enterprise culture created a lot of ridiculousness (AbstractIntegerAdditionFactoryBean memes etc.), but still, had all the zillions of lines written in Java/C# instead been written in C/C++, the CVE database might have imploded into a black hole.

But they wouldn't have been written in C/C++! There were plenty of options for these enterprise apps: from FoxPro to Visual Basic (including VBA), and more specialized ones like Pick/BASIC (that Wol raves so much all the time) or ABAP, etc.

Java and C# were sold as replacements for C/C++ (remember CorelOffice for Java and other such silly things?), it completely and utterly failed at that (and destroyed lots of products in the process, from JavaStation to Windows Phone) and then was used to program apps which were already mostly programmed in memory safe languages.

Ironic, isn't it?

Technically Java and C# replaced one memory-unsafe and quite popular language: Pascal. But Delphi wasn't the language that majority of apps were written in even if it was quite popular in some circles. And I'm not entirely sure that achievement is worth the destruction that failed attempt to replace C/C++ with Java/C# caused.

> Would it have been better if Rust had been available in 1995? Yes, but it wasn't. Well, Haskell was around then and roughly as popular as today? ;)

Haskell was never an answer, but technique that, ultimately, made Rust into what it us was, apparently, was known back in 1987.

Sure, it needed a refinement, but said refinement haven't happened precisely because the whole wold was thinking that they found the perfect solution to the safety problem (in a form of tracing GC) and tried to “plug the remaining holes in that solution”.

Only it was never a solution. Just like airbags and seat belts are not a solution for the reckless driving but automatic cameras that detect rules violation do wonder to reduce it, similarly tracing GC is not a solution for the safety issues (even if can achieve memory safety).

Java/C# vs Rust

Posted Aug 16, 2024 21:29 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (18 responses)

> Would it have been better if Rust had been available in 1995?

Rust wouldn't have been possible in 1995, simply because the compilation times and memory requirements were impossibly high for the computers of that period.

Golang would have worked, though.

Java/C# vs Rust

Posted Aug 17, 2024 2:19 UTC (Sat) by khim (subscriber, #9252) [Link] (17 responses)

> Rust wouldn't have been possible in 1995, simply because the compilation times and memory requirements were impossibly high for the computers of that period.

What exactly makes Rust extra-heavy? In my experience Rust compiler have similar memory consumption and similar compilation times to C++ compiler and C++ wasn't even all that new in 1995, Borland C++ 4.5, Visual C++ 4.0, Watcom C++ 10.5 were already released in year 1995. All of them included pretty sophisticated and non-trivial template libraries and other pretty heavy things (windows.h alone was humongous, even back then).

Sure, if Rust would have arrived in year 1995 then it would have been as slow as C++ with heavy template libraries was back then and then Linus would have rejected it, but I don't see what could have prevented Rust from improving in similar fashion to how C++ improved over time.

Java/C# vs Rust

Posted Aug 17, 2024 2:32 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

Rust can't compile individual files like C or C++ compilers (of that time). It can operate at most on the crate level.

> Sure, if Rust would have arrived in year 1995 then it would have been as slow as C++ with heavy template libraries was back then and then Linus would have rejected it, but I don't see what could have prevented Rust from improving in similar fashion to how C++ improved over time.

Heavy template libraries really started appearing in 2000-s, when RAM and CPUs became available. In 90-s a lot of compilers struggled with the STL. The abovementioned Watcom C/C++ did not support template members, SFINAE, and even ":" initializers in constructors.

Java/C# vs Rust

Posted Aug 17, 2024 3:31 UTC (Sat) by khim (subscriber, #9252) [Link] (15 responses)

> The abovementioned Watcom C/C++ did not support template members, SFINAE, and even ":" initializers in constructors.

I don't think it was lack of resources. From Visual Studio blog: throughout the VS 2015 cycle we’ve been focusing on the quality of our expression SFINAE implementation. That's year 2016!

It's just hard to support certain things when your compiler is not designed to support them. But the fact is that in year 1995 there already were compilers that supported pretty advanced C++ (I think Borland was the most advanced at that time, still) means Rust could have existed back then if someone would have invented it.

> Rust can't compile individual files like C or C++ compilers (of that time). It can operate at most on the crate level.

Um. That's how Turbo Pascal 4+ did starting from year 1987 and how Ada did from the day one in year 1983. Not a rocket science at all and, in fact, it reduces resources consumption not increases them: no need to parse the same thing again and again and again.

Java/C# vs Rust

Posted Aug 17, 2024 3:45 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

> I don't think it was lack of resources. From Visual Studio blog: throughout the VS 2015 cycle we’ve been focusing on the quality of our expression SFINAE implementation. That's year 2016!

gcc was not much better. It did not support member templates until 2.95 in 1999. In 1995 it did not even support type deduction properly, instead relying on "guiding declarations": https://support.sas.com/documentation/onlinedoc/ccompiler... - you had to explicitly match types, as the compiler couldn't deduce the common types.

> Um. That's how Turbo Pascal 4+ did starting from year 1987 and how Ada did from the day one in year 1983.

They don't have to do monomorphisation that requires you to keep pretty much all the stuff in RAM.

Java/C# vs Rust

Posted Aug 17, 2024 4:13 UTC (Sat) by khim (subscriber, #9252) [Link] (13 responses)

> They don't have to do monomorphisation that requires you to keep pretty much all the stuff in RAM.

Which means that it would have been Rust with Extended Pascal/Ada like generics (which would have evolved into a Swift-like generics later, most likely).

I think we are arguing about different things: you say that Rust in a form exactly like Rust was done in 2015 wasn't possible to gave in year 1995 while I say that Rust with all the important properties that matter for safety was easy to create using 1995 technology.

It wouldn't have been competitive with what we have today but it would have been as fast as Java in 1995 (Java wasn't a speed daemon back then buy any means) and it could have evolved, over time, into safe language that could be used for low-level things like Linux kernel, too.

But Java had better marketing and it also promoted write once, run anywhere myth thus it was chosen. And we had to wait 20 years for a something that's safer than what we had in a year 1981.

Java/C# vs Rust

Posted Aug 17, 2024 10:28 UTC (Sat) by ralfj (subscriber, #172874) [Link] (12 responses)

Rust is building on a bunch of academic Programming Languages work that just wasn't done yet in the 90s. For instance, it has taken a lot of good ideas from Cyclone.

So independent of computing resources, I think it's unlikely something like Rust could have happened in 1995.

Java/C# vs Rust

Posted Aug 17, 2024 12:04 UTC (Sat) by khim (subscriber, #9252) [Link] (11 responses)

> So independent of computing resources, I think it's unlikely something like Rust could have happened in 1995.

Oh, sure. Rust in 1995 could have been a reality only if IT industry would have picked write code so simple there are obviously no bugs in it way to resolve the software crisis.

But in reality said crisis was resolved via write code so complex that there are no obvious bugs in it which, of course, made creation of Rust in 1995 impossible.

It's ironic that around that time (in 1996 to be exact) Hoare wrote his article where he expressed his satisfaction with the fact that this approach was seemingly working.

And, of course, when everyone was to busy piling layers upon layers of scotch and bailing wire there weren't enough people to do research that could have given us Rust in 1995.

We needed two more decades to realize that while “let's pile layers upon layers of scotch and bailing wire till our creations would stop crashing every few hours” approach works to create something that is useful, but then it doesn't work in the face of adversary.

Thus yes, Rust wasn't a possibility in year 1995, but not because of hardware, rather the social issues prevented it's creation, everyone was just too busy inventing various snake oil solutions which would make programs, magically, correct even when people who write them have no idea what they are doing.

But hardware? Hardware was perfectly “kinda ready” for it: Java was insanely heavy and slow by standards of 1995 and Rust would have been a pig, too, but Rust could have become fast when computers would have become more advanced for more optimizations to become possible, same as happened to Java.

That's why yes, C#/Java craze was gigantic waste of time and resources — but also, probably, inevitable one. World needed these trillion dollars loses to admit that this was the wrong turn, before that happened (as even Hoare, himself, noted) it looked as if enough layers of scotch and bailing wire may fix everything.

Java/C# vs Rust

Posted Aug 17, 2024 17:22 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (10 responses)

> That's why yes, C#/Java craze was gigantic waste of time and resources

I disagree. Java in particular has shown that large and complicated software can be written in memory-safe languages. This was not at all a given in 90-s.

And of course, the Java ecosystem had struggled a lot to formulate the best practices.

If anyone wants to be amazed by over-engineering, just look at the EJB 1.0 standard. But even with all of its over-engineering, EJB containers like WebLogic or JBoss pioneered some of the best practices that we use even now: artifact-based deployments, monitoring and metrics, centralized logging, and even a notion of proto-containers (WARs and EARs). All starting back in 1998-1999.

Over time, the bad parts were discarded, and good parts were promoted. It provided a great learning experience for the whole industry. Would have it been better if the industry magically got all this foreknowledge back in 1999 and avoided painful diversions into the AbstractFactoryFactoryBean territory? Sure. Could it have happened this way? Not a chance.

Java/C# vs Rust

Posted Aug 18, 2024 8:43 UTC (Sun) by khim (subscriber, #9252) [Link] (9 responses)

> Over time, the bad parts were discarded, and good parts were promoted.

And it's just a sheer coincidence that “bad part” are exclusive for Java but “good part” are not? Really?

> But even with all of its over-engineering, EJB containers like WebLogic or JBoss pioneered some of the best practices that we use even now: artifact-based deployments, monitoring and metrics, centralized logging, and even a notion of proto-containers (WARs and EARs).

Pionered? In what sense? They raised hype around things invented by other, that's all. Syslog doesn't need Java and chroot was invented before it, too. IBM did remote monitoring for decades before Java was ever invented and Google's Borg never needed or used Java (it supported it, though) with Sawzall existing outside of Java, too.

> Java in particular has shown that large and complicated software can be written in memory-safe languages. This was not at all a given in 90-s.

Lisp machines already existed and they have shown that before Java was even dreamed of.

I couldn't name any single goal that Java set before itself and made better then some other language. Even cross-platform development today is mostly happening in JavaScript/TypeScript and not in Java.

C#/Java failed on all goals they set out to deliver (initial goal was to replace C/C++, remember? Javastation was supposed to only run Java applications and Avalon was supposed to replace Win32, not complement it).

C#/Java were pure waste of time and extremely disruptive for the companies that embraced them: Sun is dead is Microsoft have lost much of it's influence, because time that they wasted on unattainable goals was used by others to leapfrog them. If you recall the behaviour of “old” Microsoft then this result is 100% positive and it's true that only C#/Java could have achieved that, but somehow I seriously doubt it was intended.

It would be really funny if Rust would, eventually, replace C/C++. Because it's developers have never embraced that as a goal. There are lots of jokes about “rewrite it in Rust” and some fanboys even say then Rust have to replace C/C++, but actual developers are realists and always designed Rust for the perpetual coexistence with C/C++. On the contrary: Java and C# were designed for the world where everything (except for some low-level components) is in managed code, all development as happening within confines JVM/CLR (basically: commercializaion of List machines concept) and all software is rewritten in managed code. That vision failed utterly and miserably and have only consumed countless resources.

You may point out to the success of Android, but if you read interviews with Andy Rubin, you'll see that Android embraced Java not because of it's wonderful properties, but simply because when Android was made there were lots of Java developers. If Java detour would have never happened he would have picked something else (Apple picked up Objective C because macOS uses it and it worked well for them).

Ultimately the only thing that C#/Java detour us is that world without managed code is viable, while world with only managed code is not. Anyone with two brain cells could have predicted that on the day one, but Google have done a third attempt which is, as expected, falling apart before our eyes.

> Could it have happened this way? Not a chance.

Not in a world obsessed with attempt to make write code so complex that there are no obvious bugs in it work, sure.

Java is very much symptom, not a disease. Disease is that naïve belief that you may replace competence with tools. Planes that are losing doors and AbstractFactoryFactoryBeans come from the same root. And yes, when you traverse that road then C#/Java happens narturally. I just wish we stopped and realized that this is a wrong road to traverse before so much resources would have been wasted.

Java/C# vs Rust

Posted Aug 18, 2024 12:27 UTC (Sun) by pizza (subscriber, #46) [Link] (8 responses)

> I couldn't name any single goal that Java set before itself and made better then some other language.

*single* goal, no. But these goals didn't exist in isolation, and until Java came along, nothing else put them all of those "single goals" together in a way that was both accessible at the entry level (in part due to running on commodity systems) and useful at the high end.

Java/C# vs Rust

Posted Aug 18, 2024 12:47 UTC (Sun) by khim (subscriber, #9252) [Link] (7 responses)

It looks like an attempt to draw a target around the place where arrow hit.

Of course if you spend enough time reframing Java achievements then you can always find a way to define them in way to show that Java did something first.

But premise was to achieve something pretty concrete, billions (if not trillions!) of dollars were spent in an attempt to deliver that and none of the achievements that may last needed these things.

Sure, Java have shown that you may develop things in a managed code, but List machines did that before Java. Java have shown that you may write portable code but AS/400 did that before.

None of achievements that Java may show are new and the ones that are new are completely unrelated to these things that Java was supposed to achieve.

It's like these solar and wind power stations or electric cars: sure, they advanced certain narrow fields significantly, but does the damage they did to the world economy (and, ironically, to the world ecology) are worth it?

This question is debatable but the fact that original promise of “self-sustainable development” wasn't achieved and wouldn't be achieved on this path remains. I suspect that it would be “achieved”, in the end, via sleight of hands when nuclear power would be declared “green”, too and then everyone would be happy about how “green” power works while completely forgetting decades of investment into a dead end.

Java/C# vs Rust

Posted Aug 18, 2024 12:58 UTC (Sun) by pizza (subscriber, #46) [Link] (6 responses)

> Sure, Java have shown that you may develop things in a managed code, but List machines did that before Java. Java have shown that you may write portable code but AS/400 did that before.
>None of achievements that Java may show are new and the ones that are new are completely unrelated to these things that Java was supposed to achieve.

Congratulations, you just demonstrated my point.

Lisp machines and AS/400s were about as far removed from commodity systems as it could get, and were effectively unobtanium for mere mortals.

...Unlike Java, which you could obtain for low-to-zero cost and could run on the systems you already had.

Like it or not, Java's particular combination of features and accessibility changed the entire trajectory of the industry.

Java/C# vs Rust

Posted Aug 18, 2024 13:35 UTC (Sun) by khim (subscriber, #9252) [Link] (5 responses)

> Lisp machines and AS/400s were about as far removed from commodity systems as it could get, and were effectively unobtanium for mere mortals.

And how is that a bad thing? They were solving programs that either are not needed by mere mortals or are not solved by Java.

Java managed to [ab]use managed code but failed to achieve thing that managed code is really good for: forward compatibility. While AS/400 doesn't even give you an ability to execute in native code it's not uncommon for a Java program to code with it's own version of JRE because it may misbehave with any other version and conversion between .NET 1.x, 2.x and .NET core is pretty damn non-trivial.

Thus, in the end, C# and Java achieved intermediate goals these exotic systems achieved in pursit of worthwhile goals yet failed to achieve anything worthwhile outside of hype train.

> Like it or not, Java's particular combination of features and accessibility changed the entire trajectory of the industry.

Isn't that my original point? Java have diverged the industry, made it waste trillions of dollars on mirage that never materialized, sent it into a dead end, and now we would need to spend more trillions of dollars to undo all that damage.

Hardly something to celebrate.

Java/C# vs Rust

Posted Aug 18, 2024 13:58 UTC (Sun) by pizza (subscriber, #46) [Link] (4 responses)

> Java have diverged the industry, made it waste trillions of dollars on mirage that never materialized

...So billions of lines of inherently memory-safe code deployed onto commodity systems never happened?

Java/C# vs Rust

Posted Aug 18, 2024 14:55 UTC (Sun) by khim (subscriber, #9252) [Link] (3 responses)

> ...So billions of lines of inherently memory-safe code deployed onto commodity systems never happened?

Of course they happened! Billions of lines of code already written is various memory-safe language (from COBOL and Clarion to Visual Basic and FoxPro) were, with great pains, rewritten in two other, more resource hungry memory safe languages.

Yes, these languages never have been using “managed code” or “tracing garbage collection”, but they were perfectly memory safe and they worked perfectly fine.

No matter how you look on it the whole thing it looks like a net negative to me: we haven't gotten any tangible benefits from that rewrite (although some industries have gotten rich, that's true, but that's like burning the house to heat a stew), code that was before than “grand revolution” written in memory safe languages was still written in memory safe languages (only now with lots of more unneeded complexity) and code written in non-memory safe languages continued to use non-memory safe languages.

Java/C# vs Rust

Posted Aug 19, 2024 12:48 UTC (Mon) by pizza (subscriber, #46) [Link] (2 responses)

> No matter how you look on it the whole thing it looks like a net negative to me: we haven't gotten any tangible benefits from that rewrite

Wait, _rewrite_? Surely you jest.

New stuff only rarely replaces the old stuff; instead it's layered on top. It's turtles all the way down.

And again, it is a simple FACT that Java is vastly more approachable than the stuff it supplanted, and was useful for nearly everything, from deeply embedded stuff [1] to teaching/toy problems [2] to desktop applications [3] to enterprise consultant wet dreams -- All from the same base tooling. That was a *HUGE* change over the former status quo.

Sure, many of its use cases have since been better served with newer stuff. So what? Isn't that the fate of all technology?

[1] Multiple generations of ARM processors ran the Java bytecode natively
[2] Completely supplanting Pascal in introductory programming courses
[3] Including browser applets. Which I don't miss.

Java/C# vs Rust

Posted Aug 19, 2024 13:39 UTC (Mon) by khim (subscriber, #9252) [Link] (1 responses)

> New stuff only rarely replaces the old stuff; instead it's layered on top. It's turtles all the way down.

> And again, it is a simple FACT that Java is vastly more approachable than the stuff it supplanted

Can you stop contradicting yourself at least in two adjacent sentences?

> Multiple generations of ARM processors ran the Java bytecode natively

Nope. Few ARMv5 CPUs has the ability to run some small subset of Java bytecode. It was, basically, to run few games on some phones and for nothing else. Starting from ARMv6 only “null” implementation of Jazelle is supported.

So that's another example of pointless waste (thankfully very limited compared to the damage caused by the large C#/Java crazyness).

> Completely supplanting Pascal in introductory programming courses

Yeah. And also Scheme in some course. Another negative.

> That was a *HUGE* change over the former status quo.

Where do you see me objecting? Sure, C#/Java caused lots of changes. Almost all of them negative.

But you are arguing as if I'm objecting about magnitude of change… I'm not! C#/Java caused absolutely huge negative change.

There were, also, some minuscule positive changes, sure, but compared to problems that C#/Java craze caused they are hard to even notice.

> Sure, many of its use cases have since been better served with newer stuff.

That's not important. What is important is that almost all use cases are better served by older stuff.

> Isn't that the fate of all technology?

Sure. But C#/Java is different: that's the rare case where bad technology was replaced with worse one.

Wanted to say that it's the only such change, but nope, there are many other like that: solar and win power plants, electric cars, etc. That have only started happening recently. About quarter century ago. But, sadly, it's not limited to IT and not limited to C#/Java. True.

But the fact that was achieved by temporary disconnect between feasibility of technology and availability of funding doesn't make that change good and we would pay for that stupidity, and, it looks like, rather sooner than later.

Microsoft and Sun have already paid the price, but I doubt it would be limited to that.

Maybe this is enough?

Posted Aug 19, 2024 14:17 UTC (Mon) by corbet (editor, #1) [Link]

So this has gone on for quite some time; I don't think any minds will be changed at this point. Maybe time to wind it down?

Thank you.

UB in Rust vs C

Posted Aug 15, 2024 0:33 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

The "UB? In my lexer?" paper should probably be taken more as a sign of two specific cultural phenomena that of C++ actually intentionally having nonsensical design.

1. WG21 members and the C++ community generally trend old. The "Centipedes? In my vagina?" meme dates back to 2007. So the meme referenced is almost old enough to vote. This isn't a lone example, another bug fix paper "Dude, where's my char?" refers to a stoner movie from 2000 for example.

2. The ISO document is not in great shape. Like I said, these aren't real design choices, they're bugs, long standing, trivial bugs in the core document that is supposedly the purpose of the committee's existence. The reason they've persisted is that few people read the document and essentially nobody cares what it means. People employed to write C++ compilers use the drafts, because of course they do, an expensive PDF is worse *and* outdated, whereas the draft is current and readily accessible in many formats. But the draft also insists it is "incomplet and incorrekt" (sic) so why would you report any defects you do notice?

UB in Rust vs C

Posted Aug 15, 2024 6:40 UTC (Thu) by ralfj (subscriber, #172874) [Link] (2 responses)

> (e.g. it's UB in Rust to use uninitialized memory for fun and profit even if hardware have no objections).

Yeah, this is one of the major examples of something you can do in hardware but not in Rust (or C/C++). Such examples are always a justification for at least considering a language extension. Rust may one day expose an operation like LLVM `freeze` that makes it possible to write such code without UB. The mere existence of such an operation has its downsides, but I still expect it to happen. The discussion for that is at https://github.com/rust-lang/rfcs/pull/3605.

UB in Rust vs C

Posted Aug 15, 2024 8:32 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

Your link is broken by the period at the end of your sentence https://github.com/rust-lang/rfcs/pull/3605

UB in Rust vs C

Posted Aug 16, 2024 7:14 UTC (Fri) by ralfj (subscriber, #172874) [Link]

Ah, sorry for that, I got too used to systems that exclude trailing periods from auto-linking. It seems I can't edit my post here either. Thanks for posting the fixes link!

UB in Rust vs C

Posted Aug 16, 2024 10:08 UTC (Fri) by intelfx (subscriber, #130118) [Link] (10 responses)

> If you construct a reference from a Null-pointer, it is insta-UB. You don't even have to dereference it.
>
> The only way a compiler could "support" this <…>

I will admit I really don’t follow this concept. Why, exactly, is constructing a reference from a null pointer an “insta-UB”?

There is a lot of talk that “Rust does not create UBs from thin air” and that every UB in Rust is “sane” and actually needed to achieve some desirable thing; but what exactly is being achieved by this (as compared to only declaring *dereferencing* an invalid reference an UB)?

UB in Rust vs C

Posted Aug 16, 2024 11:38 UTC (Fri) by mb (subscriber, #50428) [Link]

There's no way to encode Null in a &T reference. It would effectively become Option<&T>, which is a different type.
So you can't construct &T from Null, because it can't hold the value.
The thing is impossible at construction time already.

In contrast to that, creating a raw pointer from anything can always hold the value. But you might not be allowed to deref it, because it's not pointing to valid memory, is unaligned or whatever. But the pointer *itself* would hold the intended bit pattern.
(This property is even exploited in some areas https://doc.rust-lang.org/std/ptr/fn.dangling.html)

UB in Rust vs C

Posted Aug 16, 2024 12:22 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

> but what exactly is being achieved by this (as compared to only declaring *dereferencing* an invalid reference an UB)?

Effective fix for the billion dollars mistake, essentially. References, in Rust, couldn't be null, attempting to create such a reference is an instant UB, but Option<&T> can hold None and, more importantly, it's guaranteed that in-memory representation for None in Option<&T> is the exact same thing as null in pointer and it's even guaranteed that it would be the same as null in pointer used by C on that platform!

That means that if you faithfully map nullable pointers to Option<&T> and non-nullable ones to &T then both Rust developers and Rust compiler would know what do you mean (if your function receives &T then you know that checks are not needed, object would be there, 100% guaranteed by the language, and if your function receives Option<&T> then you have to perform that check or else you couldn't dereference it, again language guarantees that).

That's really valuable property and to uphold it an attempt to push null into &T was declared “an instant UB”.

Note that currently even creation of non-null dangling reference is considered UB but that one is under intense debate: it enables some valuable optimizations, but that means that sometime you have to create valid objects from the “thin air”, etc. Before the final decision would be reached it's declared as “currently UB” because adding UB to the language is breaking change and removing it is not and and since it's not entirely clear why would someone need to create a dangling reference (most of the time you may just create a dummy object and pass around reference to that object when needed) it's kept as UB for now.

But that one is debated while attempting to shove null into reference just means you need Option<&T> in that place and it's better for everyone that you would just go and fix the code instead of begging for the dangerous (and pointless) changes to the language.

UB in Rust vs C

Posted Aug 16, 2024 12:49 UTC (Fri) by khim (subscriber, #9252) [Link]

Wrong link. The billion dollar mistake is this one, but I'm pretty sure you saw it already.

UB in Rust vs C

Posted Aug 19, 2024 0:34 UTC (Mon) by intelfx (subscriber, #130118) [Link]

> tive fix for the billion dollars mistake, essentially. References, in Rust, couldn't be null, attempting to create such a reference is an instant UB

Here you're just restating the question and handwaving vigorously. This is not an answer.

>but Option<&T> can hold None and, more importantly, it's guaranteed that in-memory representation for None in Option<&T> is the exact same thing as null in pointer and it's even guaranteed that it would be the same as null in pointer used by C on that platform!

Okay, yeah, so if I correctly understand what you are trying to say here, it's to make niches optimization possible. I didn't think of it.

Null reference as insta-UB

Posted Aug 16, 2024 13:29 UTC (Fri) by farnz (subscriber, #17727) [Link] (5 responses)

There's a theoretical reason, and a practical reason.

First, the theoretical reason: a reference has a validity constraint that it always, unconditionally, refers to a valid place. If you permit a reference to be "null", you now have to change the validity constraint to say that the reference either refers to a valid place, or is null; this is Hoare's "billion dollar mistake". There's a whole pile of things that pile up behind this change; it's not a trivial thing to do, since it affects the semantics of the entire language.

The practical reason is around optimization: if a reference must point to a valid place, then it's always OK to access the place it points to; that, in turn, means that you can write clear code, and have the compiler optimize it to the best code. It can, for example, change a conditional read of a reference to an unconditional read and a conditional use at all the places you had a conditional read, without having to consider the possibility that the conditional use was protecting against the reference being null. And because the compiler is aware of the memory model, it can issue the read much earlier, knowing that there are no memory accesses later in the function that can affect either where the reference points to, or what the value read can be.

Null reference as insta-UB

Posted Aug 21, 2024 20:38 UTC (Wed) by riking (subscriber, #95706) [Link] (4 responses)

Note: "references must point to a valid instance of the object" is actually the safety invariant. The validity invariant is "initialized, non-null, aligned to the alignment of the object".

(What does that mean? It means that unsafe code can temporarily hold references that don't point to valid objects as long as it's careful what it does with them (doesn't try to read) and doesn't let the reference escape into safe code not controlled by the author of the unsafe code.)

Null reference as insta-UB

Posted Aug 21, 2024 21:14 UTC (Wed) by mb (subscriber, #50428) [Link] (2 responses)

But this is only true for "initialized". Even unsafe is not allowed to construct null-references.

Null reference as insta-UB

Posted Aug 21, 2024 22:07 UTC (Wed) by riking (subscriber, #95706) [Link] (1 responses)

The validity invariant is the things that unsafe code can't ever do. The safety invariant is the things it can be careful about and can't let escape to uncontrolled safe code.

Null reference as insta-UB

Posted Aug 21, 2024 22:22 UTC (Wed) by mb (subscriber, #50428) [Link]

Ok, I guess I don't understand your original posting then.

Null reference as insta-UB

Posted Aug 22, 2024 8:11 UTC (Thu) by farnz (subscriber, #17727) [Link]

That's why I said references must point to a valid place, not a valid instance. It's entirely permissible for the place that's pointed at to not be a valid instance, as long as it's a valid place for the referent type to live in.