Preventing data races with Pony

By Daroc Alden
January 3, 2025

The Pony programming language is dedicated to exploring how to make high-performance actor-based systems. Started in 2014, the language's most notable feature is probably reference capabilities, a system of pointer annotations that gives the developer fine manual control over how data is shared between actors, while simultaneously ensuring that Pony programs don't have data races. The language is not likely to overtake other more popular programming languages, but its ideas could be useful for other languages or frameworks struggling with concurrent data access.

Pony was primarily designed by Sylvan Clebsch, who had attempted to write an actor framework in C and C++ while employed at a financial firm. According to Clebsch's description of the motivation for Pony, that system was fast and useful to the company, but it was also plagued by constant bugs:

Over and over again, programmers ran into memory errors. And not just the usual problems with dangling pointers (premature free) and leaks (postmature free?) but persistent problems with data-races.

That experience got him interested in how to do better, and he began a several-year quest to read relevant academic papers and try to synthesize them into a cohesive whole. Clebsch ended up starting a Ph.D. at Imperial College London, where he met other people interested in working on the same problem. They built the system that would become Pony, and released the code under the BSD 2-clause license in 2015, which attracted many more contributors.

Actors

In Pony, a program consists of a set of actors: independent units of execution that own their memory and communicate by exchanging asynchronous messages. Encouraging programmers to break their programs up into multiple actors makes Pony programs well-suited to running on multithreaded systems; the Pony runtime uses one OS-level thread per available CPU, and schedules actors across these threads automatically. It also means that Pony programs don't have global state — all data is owned by one actor in particular, which is important to Pony's data-race-safety guarantees. Declaring an actor in Pony looks like this:

    actor Aardvark
      let name: String
      var _hunger_level: U64 = 0

      new create(name': String) =>
        name = name'

      fun get_hunger_level(): U64 =>
        _hunger_level

      be eat(amount: U64) =>
        _hunger_level = _hunger_level - amount.min(_hunger_level)

Actors are somewhat similar to the objects of object-oriented programming. Indeed, Pony also has classes (although it doesn't support inheritance). The above example of an actor declaration, adapted from the Pony tutorial, looks similar to a class. The difference is that only an actor can call its own methods (indicated with the fun keyword). All that the code outside the actor can do is invoke "behaviors" (indicated with the be keyword), which run asynchronously and don't return results. Invoking a behavior sends a message to the actor, which will pick it up and execute the behavior whenever it is next scheduled. Calling a method is synchronous and runs in the same thread, but invoking a behavior is asynchronous, and may run in a different thread. Unlike private methods in other languages, one Aardvark instance can't even call methods on another Aardvark, only invoke a behavior.

This means that code from the same actor can never be run from multiple threads at the same time (although actors can be migrated between threads by the scheduler). So, unlike object-oriented languages like Java, there is never any need for synchronization within an actor. Each actor has a queue of behaviors that have been invoked, which it processes one at a time, sequentially. If one actor can't keep up with the stream of messages, the programmer can instantiate multiple actors of the same type, and distribute messages between them.

That naturally raises the question of synchronization between actors. Other actor-based systems tend to follow one of two approaches: requiring data to be copied between actors, so that there is no shared data at all, or relying on the programmer to use appropriate locking. Erlang, perhaps the most famous actor-based language, does the former. Unfortunately, copying data between actors has a serious performance penalty, especially for read-only data that wouldn't be subject to a data race in any case. Many actor frameworks for other languages do the latter, which loses a lot of the benefits of an actor-based model compared to just using threads. Pony does neither.

Reference capabilities

Pony's approach is to introduce six different kinds of pointers. That may initially sound like overkill, but the system lets a Pony program model precisely when and how data can be accessed from multiple actors. For example, immutable data that is passed between actors never needs to be copied; the Pony runtime can just pass a pointer, and then access that memory from another thread without synchronization. The six kinds of reference capabilities are:

Isolated: A unique pointer, for which there are no other references. Because there are no other references, it's safe to send to another actor, giving up access in the process, even though isolated values are mutable.
Value: Immutable data, that cannot be changed by any actor, and is therefore safe to share.
Reference: A normal, non-unique pointer that supports reading and writing. Since the pointer is not unique, references cannot be sent to another actor, as that might create a data race.
Box: A read-only reference. Some other reference might modify the data, but this one cannot. Boxes are mainly used to abstract over whether code is working with a value or a reference.
Transition: A unique writable pointer to an object that may also be pointed to by some read-only pointers (boxes). Unlike reference pointers, a transition pointer can later be turned into a value pointer in order to freeze the object, since it is unique.
Tag: A pointer that does not support reading or writing, but it can be stored in data structures and compared for equality with other pointers. Tags are also used for referencing other actors, and the type system allows sending messages using a tag, even though that does require special handling from the runtime.

Isolated, value, and tag pointers can all be sent between actors, because they can't be used to construct data races. References, boxes, and transitions can all be used within an actor, but not sent between them, because they could allow one actor to write to a piece of data that another actor could read. Multiple writable pointers within an actor can't cause a data race, because the code inside an actor executes synchronously.

When an object is initially instantiated, the constructor gives the calling code back an isolated pointer that can be used to modify the object, call methods, etc. The pointer can be passed to another actor (which requires the currently executing actor to give up access to the pointer), or converted into one of the other kinds of reference capability. Most of the kinds can be converted into each other under the right circumstances — for example, any other type can be converted into a tag. Common conversions are to a value pointer to make the object immutable, or to a reference to use internally without sharing. Boxes and transitions are less commonly used, and mostly show up in generic library code. References to other actors are tags, and can be used to invoke behaviors.

These six reference capabilities are structured to be as flexible as possible while still upholding one key requirement: no actor can write to anything that a different actor can read. Then, the runtime system just has to ensure that sending a message between actors is enough of a synchronization barrier to ensure that reads from an actor will see all of the previous writes to the data. With these properties in place, it's impossible to construct a data race in Pony. In fact, the standard library doesn't even include locks, since they wouldn't be of any use to a Pony program.

The whole system is somewhat like Rust's lifetime tracking, in that it is a compile-time analysis that prevents multiple threads from having mutable pointers to the same data. Unlike Rust programmers, however, Pony programmers only have to worry about six specific types of pointer, instead of arbitrary lifetimes. The system can also be used in ways that Rust's lifetime-annotated references cannot. For example, it's easy to build doubly linked lists in Pony using transition and box pointers.

Dealing with garbage

No one solution is ever perfect, however. Pony's message passing is safe, performant, and reasonably simple to reason about. The price it pays is garbage collection. Since Pony doesn't track lifetimes or leave things up to the programmer, it relies on run-time garbage collection. Unlike other garbage-collected languages, however, this doesn't cause noticeable latency spikes.

Other languages have experimented with advanced concurrent, pause-less garbage collectors. But all of them have some pathological cases where garbage can be created faster than it is collected. Pony takes the comparatively simple approach of using a plain mark-and-sweep collector. The trick is that each actor is responsible for its own garbage collection, so the whole program never has to pause at once. Actors also never do garbage collection while they are executing a behavior, and instead perform collections in between processing messages.

Since mutable data can't be shared between actors, even though multiple actors can be referencing a piece of data, the only reference cycles are within an actor's own private data. So actors can treat any external references to their data as being in-use for the purposes of garbage collection, without causing any unreclaimable cycles. This keeps the marking phase of the collector simple, and avoids needing to introduce any synchronization barriers into the garbage collector. Overall, this system results in a remarkably flat latency profile.

Trying it out

The Pony web site includes documentation and tutorials on the language, as well as a playground that runs Pony in the browser, for those who want to try it out without installing. As with Godbolt's Compiler Explorer, the Pony playground will even show the assembly code produced by the Pony compiler.

The project suggests that anyone wishing to install Pony on Linux use the ponyup script (the installation instructions for which sadly involve piping a downloaded script directly to bash), which lets the user easily install and test multiple versions. I ran into some problems using the script on Fedora 41, but it turned out that the Pony release for Fedora 39 worked just fine on Fedora 41. The Pony compiler does require the gold linker from the GNU binutils project, as well as a C compiler supporting at least C11, so those may need to be installed separately.

Once all of the necessary prerequisites are installed, running ponyc in a directory containing Pony source code produces a single, dynamically-linked, native executable. The choice to have the compiler search out and compile all the .pony files in a directory is a bit unorthodox, but it does have the advantage of simplicity. Pony only distributes pre-built releases for x86_64, but does support building for other architectures. The project also recommends building the Pony runtime library from source for the specific microarchitecture that one's application will run on to achieve the best performance.

Conclusion

Pony is not necessarily the right tool for any particular application. For one thing, actor-based systems are mostly best suited to long-lived applications that need to process many requests in parallel. For another, it is driven by a small team of volunteers, which means that it doesn't have nearly as much support available as some other languages. On the other hand, Pony does welcome outside contributions, including new language features, so it might be more adaptable than more established languages. It also has a fairly limited standard library, and hasn't even officially reached a 1.0 release. (The most recent release is version 0.58.7, released on November 30.)

But despite that, Pony's ideas seem like they could be more widely applicable. Most existing languages rely on manual locking, unnecessary copying, or some combination of the two to prevent data races. Copying Pony's approach of simplified compile-time enforcement of a small set of rules for pointers encoded in the type system seems like it could present a safer, more performant alternative.

Comparison to Go?

Posted Jan 3, 2025 15:46 UTC (Fri) by willy (subscriber, #9762) [Link] (33 responses)

Thanks for comparing to Rust. As I was reading this article, Pony feels more comparable to Communicating Sequential Processes, so Go might be a better comparison than Rust?

(I am a mere dabbler in programming language theory, so this might be a very stupid thing to say)

Comparison to Go?

Posted Jan 3, 2025 16:20 UTC (Fri) by khim (subscriber, #9252) [Link] (32 responses)

Go doesn't prevent data races. Rust does. That's why comparison to Go doesn't make much sense: the core thing that's under discussion is missing.

Comparison to Go?

Posted Jan 3, 2025 16:50 UTC (Fri) by daroc (editor, #160859) [Link] (31 responses)

Yes, that's why I chose the comparison.

But Willy does have a good point that Pony's model is more similar to Go's than to Rust, since I believe both Go and Pony were at least somewhat Erlang-inspired. I think it would be fair to call Pony a "safer Go" in very roughly the same way that Rust is a "safer C"; similar underlying models, with compile-time formal verification on top, and then that requires tweaking a bunch of the underlying details.

Comparison to Go?

Posted Jan 4, 2025 9:39 UTC (Sat) by dottedmag (subscriber, #18590) [Link]

The language seems to be so close to Erlang that the better comparison IMHO is "Erlang with better, but still safe, data sharing"

Comparison to Go?

Posted Jan 5, 2025 0:47 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (29 responses)

I think I'd have been happier to agree with your "safer C" description for Rust before some friends who have years of expertise in C but otherwise very different backgrounds tried to learn Rust and all struggled with things I'd thought were obvious. Rust is spelled a lot like C (or perhaps C++) on purpose because it makes C and C++ programmers more comfortable - but that's not what's really going on. My background included a couple of years of the Standard ML of New Jersey (SML), and that's why Rust's types make sense, friends who were just as comfortable in C but hadn't written any ML bounced off the type system because it's not C's type system.

The C-like syntax makes Rust more approachable to "typical" programmers than it might have been otherwise. I watched a talk about Advent of Code recently where Eric explains that he will sometimes deliberately put the problems in the "wrong" order to fool a few people who "know" they can't do a day 20 problem into trying to it - now this is a day 10 problem and they've done lots of day 10 problems before so it should be fine. Most of them will learn that actually it might be harder than it looks, but lots of them (including some of that first group) will succeed when otherwise they might not have even tried at all.

But the Rust type system can't hide that it's not C's type system, and it wouldn't want to. C is almost as wrong about types as it could possibly be. Code which should compile doesn't because C wants you to explicitly acknowledge with dedicated syntax if you've got a reference to an object, not the object itself and vice versa. But, code which shouldn't compile does anyway because C doesn't care whether it's sensible to add (for example) a bool to a float or a character to a pointer, only that it can cobble together some meaning for these operations and so that must undoubtedly be what you meant.

Comparison to Go?

Posted Jan 5, 2025 1:08 UTC (Sun) by khim (subscriber, #9252) [Link] (19 responses)

> I think I'd have been happier to agree with your "safer C" description for Rust before some friends who have years of expertise in C but otherwise very different backgrounds tried to learn Rust and all struggled with things I'd thought were obvious.

I don't think Rust ever tried to cater for pure C developers. How much C++ was done by your friends?

> C is almost as wrong about types as it could possibly be.

You have to remember that C was born out of typeless BCPL which have precisely one type: machine word. That means that C doesn't have any coherent rules in it's so-called “type system” (but includes many “clever” hacks… the fact that you can write int x[3] and end up with pointer… it just nuts).

C++ tried to fix the most egregious issues with C type system (and many other things), but because it was designed to be “kinda-sorta-backward-compatible”… it couldn't fix everything (in particual “int x[3]” still could give you pointer, not array… but at least type of 'a' is now char, not int).

Yet so-called modern C++ added enough features to the language and the standard library that it shouldn't be to hard to grok Rust if you know how to return errors from function with std::expected or specify requirements of your template with requires.

Sadly, in my experience, “years of expertise in C but otherwise very different backgrounds” doesn't automatically imply that someone knows even how to build template data structures!

Comparison to Go?

Posted Jan 5, 2025 14:35 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (18 responses)

Neither me nor any of these friends have written actual C++ (doubtless we've used a C++ compiler to compile C in some cases here and there over the many years, but e.g. I never wrote a single template let alone needed TMP). There is some C++ in one of the larger pieces of software I co-wrote back when I was specifically paid to write C for a living, but that code was all written by my colleague Cezary, and I interacted with it only via a C API. It's fairly self-contained, the way that say, a kernel driver would be.

It's conceivable (I think it probably exists on floppy disks, which have likely bit rotted in the intervening years) that my first C was technically C++ because it was written for the Borland C++ compiler on a computer I had only intermittent access to in about 1991 and I was too new to know that e.g cout << "Foo" isn't C. The earliest stuff I still definitely have is from about 1993-94 and is all actually C (mostly C89 but clearly inflected by seeing too much K&R code), soon after that I went to university and learned SML.

Of course Rust's syntax for generics is borrowed from C++ and Java, but assuming you recognise this syntax you're fine, the semantics are not C++ semantics and I'd guess if anything that's misleading. C++ programmers are used to templates, which are just text mangling again of course and so are duck typed, Rust won't let you do that.

It is plainly wrong to say that C's int x[3] gets you a pointer, that's an array. It decays to a pointer at function edges which is of course miserable, but when we actually make it that's an array, the compiler knows what it is and how big it is - it's actually no worse off than Rust's analogous let mut x: [i32; 3]; in principle.

It's true that I've read a _lot_ of C++ initially because I wanted to understand HashMap, and thus hashbrown, and thus the Swiss Tables and so I ended up watching CppCon and reading the source code. So I probably could write pretty good C++ today if I wanted to (I do not want to). But that all comes after I learned Rust. I read the C++ memory ordering model after I used the Rust implementation of exactly the same model. When I first saw std::expected and std::optional I already knew Result and Option very well. When I read the paper proposing the C++ 11 move semantic I had already been using Rust's default move assignment semantic and thinking this is obvious for some time. When I read Barry Revzin's "trivial union" work I came at that being intimately familiar with the details of MaybeUninit<T> which is basically what Barry is trying to be able to do in C++.

So much to say: I do not believe that C++ helps. It's probably unavoidable, obviously the initial Rust programmers have used C++ because they're at Mozilla which was a C++ shop and which has a codebase that's far older than standard C++, but while the language would be very strange (not least the syntax) if you've never used another semicolon language like C or C++, the type system is equally strange if you've never used an ML such as Ocaml or F#.

I think C++ programmers are actually at a disadvantage because they've often been told that (to quote Herb Sutter) "All you need is class" and that's just not true. C++ class is a product type, it's a very fancy product type, with years of refinement but it's not the right shape for every problem, it is the proverbial hammer and so C++ programmers assume everything is a nail and some things just aren't. More than once I have tried to explain Empty types (Rust's ! is the canonical empty type but Infallible is also a good type to explain why we want this) and C++ programmers just assume I must be mistaken about what this is, because their language can't represent it and so it must not exist right ?

Comparison to Go?

Posted Jan 5, 2025 17:11 UTC (Sun) by khim (subscriber, #9252) [Link] (8 responses)

> C++ programmers are used to templates, which are just text mangling again of course

If they are “just text mangling” then why this is a compiler error:

template <typename T>
T duplicate(T t) {
  return t + t;
}

std::string duplicate_string(std::string s) {
    return duplicate(s);
}

template<>
std::string duplicate(std::string s) {
    return s + " " + s;
}

> and so are duck typed, Rust won't let you do that.

Yeah, rust generics are very much artificially crippled to make them “feel” closer to what one can do in ML languages, but it's another lie – this time to bring more people with ML background on board.

If you play tricks with std::any and transmute_copy then most TMP tricks can be used in Rust, too.

Of course the whole things feels a bit like an attempt to use JavaScript tricks from TypeScript. As in: you constantly feel that compiler is there to detect that you are doing something “improper”, but the thing that's under the cover is still the exact same instantiation and monomorphization as in C++.

But modern C++ also brought some tricks that do the same, thus differentce is less than what one may expect.

In fact C++0x (which became C++11) was supposed to get generics in almost the exact same form as Rust have them today – but as in Rust this made some powerful techniques extremely painful to use thus from enforcing typing (as was planned) C++ went to something closer to Python's type annotations: something that's optional and can be disabled at will

> It is plainly wrong to say that C's int x[3] gets you a pointer

I said “you may end up with pointer”. Here it's a pointer:

void foo(int*);
void bar(int x[3], int y[4]) {
  x = y;
  foo(x);
}

If it's an array then why x = y is not a compile-time error?

> it's actually no worse off than Rust's analogous let mut x: [i32; 3]; in principle.

AFAIK such x would always be array. You can not repoint it anywhere, if you have mut x: [i32; 3] and mut y: [i32; 4] then these are different and so on.

> I do not believe that C++ helps

Depends on what exactly you know from C++. Knowing modern C++, most definitely, helps, because Rust is basically implemeneting the same ideas that modern C++ implements but does that is more clean fashion because it could afford to break compatibility.

But if you only know tiny bit of C++98 and proudly proclaim that you are a C++ programmer because you declare variables in the middle of function because compiler that use took three decades to go beyond C89, then sure, knowledge if this subset of C++ wouldn't help.

> It's probably unavoidable, obviously the initial Rust programmers have used C++ because they're at Mozilla which was a C++ shop and which has a codebase that's far older than standard C++

The most important part: they used LLVM which was designed to support C++ from the ground up. Means features that are different from C++ have to be “bolted on”, in may cases.

They could have added things to LLVM to support something radically different, like Swift did, and they eventually even did that (when they added async, e.g.), but core was pretty much dictated by what C++ compiler can do.

> if you've never used another semicolon language like C or C++, the type system is equally strange if you've never used an ML such as Ocaml or F#.

I would say simpler: if you only ever used one language (not matter which language) then any other language would be “strange”. That's where silly fight about if C and C++ are “different languages” or one is “continuation of the other” comes from.

I would say that Rust is much closer to modern C++ than to Ocaml or F#. Even if many ideas of Rust's standard library are borrowed from Ocaml type system is closer to C++ – simply because of how it's implemented.

> I think C++ programmers are actually at a disadvantage because they've often been told that (to quote Herb Sutter) "All you need is class"

That's not modern C++, that's “C with classes”. Today “classic OOP” is rare to see in modern C++ code – but there are plenty of templates and lambdas.

> C++ class is a product type, it's a very fancy product type, with years of refinement but it's not the right shape for every problem, it is the proverbial hammer and so C++ programmers assume everything is a nail and some things just aren't.

Except nothing in C++ after C++98 extended that “proverbial hammer”. All improvements in C++ after C++98 are bringing things are are in Rust, too. And the other way around, too, of course. Remember how much effort it took to add [const generics](https://blog.rust-lang.org/2021/02/26/const-generics-mvp-beta.html) and [GATs](https://blog.rust-lang.org/2022/10/28/gats-stabilization.html) to Rust? Well… C++98 already had both. It even used GATs in its standard library.

> and C++ programmers just assume I must be mistaken about what this is, because their language can't represent it and so it must not exist right ?

Except their language, of course, could represent it! It's what noreturn function returns. С (and C++) don't have empty type either and yet they have pointer to such type and even two different kludges to deal with it.

Ultimately it's not about difference between C++ and Rust but about the desire to alter your mental map of the world. This, again, sends us back to “knowledge of C or C++” vs “knowledge of modern C++”: not only modern C++ closer to Rust than any other language (yes, including ML dialects), but, more importantly, modern C++ is far enough removed from C that someone who just simply refuses to accept changes is unlikely to learn it.

The litmus test, in my experience is this innocent example from cppreference (means it's not something one would see in a blog post but quite literally example from description of the language):

using var_t = std::variant<int, long, double, std::string>;
 
template<class... Ts>
struct overloaded : Ts... { using Ts::operator()...; };
 
int main()
{
    std::vector<var_t> vec = {10, 15l, 1.5, "hello"};
 
    for (auto& v: vec)
    {
        std::visit(overloaded{
            [](auto arg) { std::cout << arg << ' '; },
            [](double arg) { std::cout << std::fixed<< arg < ' '; },
            [](const std::string& arg) { std::cout << std::quoted(arg) << ' '; }
        }, v);
    }
}

I'm yet to see anyone who may write code like this and yet couldn't easily pick up Rust.

Comparison to Go?

Posted Jan 5, 2025 21:46 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (3 responses)

There's a lot here and this is rather a distraction from an article about Pony, but, briefly:

That compiler error is just because C++ insists on things being written in a specific order, just re-arrange the specialisation so that it's earlier in the source code.

Yes the C++ 0x Concepts were largely equivalent to Rust's traits, but they weren't what Bjarne wanted and so the Concepts Lite in C++ 20 is closer to Bjarne's original concept from decades earlier.

I was assuming (and perhaps you intended?) that the array was a variable declaration. You're correct that as a part of the function signature these are silently a pointer. I'd love to believe that common C compilers warn you about this if you do it, but I find that I don't even care enough to check.

I don't agree at all that these are the same ideas except in the broad sense that some ideas (e.g. the move assignment semantic) pre-date C++ attempting them and were known good ideas in PLD anyway so the fact they're in both C++ and Rust is a coincidence. That's true for Option for example, std::optional is what you might build if you were the C++ committee and you saw a Maybe type (common in many functional languages) and wanted that for C++. And you could say the same about Option and Rust, but neither directly inspired the other.

And I can't help you on Herb's position, Herb Sutter said that indeed he sang it repeatedly, on stage, it was the whole thesis of a large section of one of his "future of C++" keynotes. Take it up with Herb, not me, if you disagree about the modern C++ language, he's the convener. The other user defined types are not gifted anywhere near the power given to class.

Likewise I can't help you with the belief that somehow an attribute hack is a type, we can actually ask C++ what type is returned by a noreturn function and it'll happily tell us the type from the function signature, which is in this sense wrong. No Unique Address is a kludge for ZSTs, but the Empty type isn't a ZST you're probably thinking of the unit type. Monostate is another unit type hack, so that checks out but again, not the Empty type. Actually you're giving about the same answers as I described, remember I have heard all this before.

Comparison to Go?

Posted Jan 5, 2025 22:34 UTC (Sun) by khim (subscriber, #9252) [Link] (2 responses)

> That compiler error is just because C++ insists on things being written in a specific order, just re-arrange the specialisation so that it's earlier in the source code.

Sure, but that's not how Rust or even C macros work. There compiler really doesn't care about anything but tokens.

And “template” systems written C macros never care about things like if specialization is already defined at the instantiation time or not.

C++ does… and ironically enough Rust does, too. That's one of the reasons why Rust's specialization is still unstable and why Rust does have such elaborate orphan rules.

> I was assuming (and perhaps you intended?) that the array was a variable declaration.

Why would you assume that? I faced that issue with Vulkan marshalling. blendConstants have float blendConstants[4]; array, vkCmdSetBlendConstants have float blendConstants[4]; pointer… and yes they are described identially in the “source of truth” API definition. You couldn't understand what you are working with just from the entity definition, need to have context-dependent parsing). Very annoying.

> I'd love to believe that common C compilers warn you about this if you do it

They couldn't. There are bazillion APIs with such pointers. I wonder if all people involved even had an idea that they are actually passing pointer and not array.

> And you could say the same about Option and Rust, but neither directly inspired the other.

That's pretty bold assertion if you recall that people that do the work on rust have to know C++ and often refer to things that they find in C++ proposals. There are tons of references to various C++ proposals on IRLO with discussions about whether or not these proposals would be useful for Rust.

> Actually you're giving about the same answers as I described, remember I have heard all this before.

This explains things.

> Likewise I can't help you with the belief that somehow an attribute hack is a type, we can actually ask C++ what type is returned by a noreturn function and it'll happily tell us the type from the function signature, which is in this sense wrong. No Unique Address is a kludge for ZSTs, but the Empty type isn't a ZST you're probably thinking of the unit type. Monostate is another unit type hack, so that checks out but again, not the Empty type.

If that is what you have been saying to your C friends who have years of expertise in C but otherwise very different backgrounds then no wonder they couldn't get Rust! You may say that Rust's never type is different from function attribute, but that's like discussing the details of how you need wheels for the car but horseshoes for the horse: minutia details of implementation that inhibit the understanding. Especially if you recall that never type in today's Rust is not even a type yet! Sure, Rust developers are working on it, while C developers are happy with hack… but these are hacks, in both cases. Even if Rust version may sometime become a proper type.

For me the experience is total opposite: if you tell people about similarities between C++ and Rust instead of trying to discouraging understanding by saying that things work in Rust like in ML and not like in C++ (when all three are similar yet different) then sure, that's one way to make people confused and uncertain.

While Rust picked many ideas from ML and pile of other languages “under the hoot” it's built on top of C++ compiler and this affects many things in it very deeply.

Comparison to Go?

Posted Jan 6, 2025 16:58 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (1 responses)

! is in practice a real type, you're just not yet allowed to use that name in stable Rust, I'd guess it might stabilize this year. Types you can't name aren't a big deal, both Rust and C++ have types you can't name.

You can make your own Empty user defined type easily today and that'll have a name, enum KhimDemo {} now KhimDemo is an empty type. We cannot make any value of this type, and so if our function claims to return one we know that function diverges, if some code seems to need to assign one to a variable that code never executes and so on, the compiler can prune lots of dead code as a result.

C++ has a trait predicate std::is_empty which is true for a unit type, which is a bit like when people convince themselves that the multiplicative identity and the additive identity should be the same in arithmetic, thus 1==0. And that's the exact confusion you've also demonstrated by suggesting No Unique Address and monostate are relevant here.

Comparison to Go?

Posted Jan 9, 2025 0:52 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> You can make your own Empty user defined type easily today and that'll have a name, enum KhimDemo {} now KhimDemo is an empty type. We cannot make any value of this type, and so if our function claims to return one we know that function diverges, if some code seems to need to assign one to a variable that code never executes and so on, the compiler can prune lots of dead code as a result.

The (current version of the) Rust stdlib is even kind enough to provide one of these out of the box. It's called Infallible, and is used pervasively as a substitute for ! in Result<T, !> (e.g. the blanket impl of TryFrom<T> for T currently returns Result<T, Infallible>, but when ! is stable it will be changed to Result<T, !>), hence the name "Infallible." The compiler is already smart enough to treat Infallible very similarly to ! for many (but not all) purposes. Infallible is not magic, and is literally defined as an empty enum just like your KhimDemo.

> C++ has a trait predicate std::is_empty which is true for a unit type, which is a bit like when people convince themselves that the multiplicative identity and the additive identity should be the same in arithmetic, thus 1==0.

I don't think it's quite that bad, I think this is just a case of poor terminology. C++ has no true empty types, at least in my understanding of the language. Unit and monostate are "empty" in the sense that they behave a lot like an empty struct (which is also a unit type, in languages that allow empty structs). That is not how the term "empty" is used in type theory, but calling it std::is_unit would confuse the hell out of programmers who have no knowledge of ADTs or type theory, and the abuse of terminology causes no confusion because there is no empty type to confuse it with. I'm also of the opinion that monostate was a terrible name, for exactly the same reason (but I'm not entirely sure what name they should have used instead).

Comparison to Go?

Posted Jan 10, 2025 17:52 UTC (Fri) by ralfj (subscriber, #172874) [Link] (1 responses)

> Yeah, rust generics are very much artificially crippled to make them “feel” closer to what one can do in ML languages, but it's another lie – this time to bring more people with ML background on board.

Rust generics are not modeled after ML modules at all, they are modeled after Haskell typeclasses. So Rust feels nothing like ML when it comes to generic programming, it's a completely different style. Have you ever actually programmed with ML-style modules to be able to do a qualified comparison?

And if you consider "ensuring the equivalent of basic type safety for generics" to be "artificially crippling" the language, you should also call C++ "artificially crippled" because it doesn't let you use "+" on a "std::list" and a "std::hash_map", not even inside dead code where it clearly doesn't matter! That's just what a proper type system does: it rejects code that can't be easily shown to follow some basic correctness principles. There's nothing at all artificial about it. C++ chose to not do static typing for its generics, Rust chose to do static typing on its generics. IMO Rust made the right call. You don't have to like it, not everyone likes statically typed languages, but this kind of dismissal is entirely unwarranted.

Comparison to Go?

Posted Jan 10, 2025 18:39 UTC (Fri) by khim (subscriber, #9252) [Link]

> Rust generics are not modeled after ML modules at all, they are modeled after Haskell typeclasses. So Rust feels nothing like ML when it comes to generic programming, it's a completely different style.

Well… the important thing, verification before instantiation, is the same, but I guess you would know better.

> Have you ever actually programmed with ML-style modules to be able to do a qualified comparison?

Yes, but that was long ago and I wasn't doing much generic programming thus I defer to your experience: if you say that Rust picked its generics from Haskell and not ML then so be it. My point was that they are entirely different from C++ or Zig.

> And if you consider "ensuring the equivalent of basic type safety for generics" to be "artificially crippling" the language, you should also call C++ "artificially crippled" because it doesn't let you use "+" on a "std::list" and a "std::hash_map", not even inside dead code where it clearly doesn't matter!

Thanks for the vote of confidence! Yes, that was an issue with C++ before C++17. The problem was that “dead code” without extra markup depends on the quality of optimizer. Making validity of someone's code depend on quality of the optimizer wasn't a good idea and that's why this limitation was in place till C++17 fixed it: now, if you use if constexpr compiler understands that code would be “guaranteed dead” and in that code it's perfectly valid to add std::list and std::unordered_map. Because it happens inside dead code where it clearly doesn't matter!

That's precisely what turned TMP from crazy wizardly accessible only to some “initiated” to easy and simple instrument.

Note that example from excors already relies on that extension. It's literally everywhere in modern C++!

> That's just what a proper type system does: it rejects code that can't be easily shown to follow some basic correctness principles.

Sure. And that turns it into a straitjacket. Quite stiffing and limiting. That was my point.

> There's nothing at all artificial about it.

Of course there is! If you use tricks to circumvent it (accept type that doesn't implement Debug, e.g. and then print it using well-known loophole… everything works.

Every typesystem is “artificial” to some extent, but when we are dealing with things like integers or structs that have different representations in memory… Rust protects you from easily shooting yourself in the foot and it also provides an easy way to tell the compiler to reinterpret object of one type and object of some other type.

But when we are dealing with types… checks are much stricter yet there are no official way to circumvent them when that's needed. Why? What's the reasoning behind that decision?

> IMO Rust made the right call.

It made the right call WRT defaults, sure. When your task can easily fit in a statically typed world… it's great: error messages are better, there are less code to debug… but the fact that Rust does provide unsafe for it's typesystem but nothing “official” is provided for it's meta-typesystem is jarring. The fact that if const in Rust doesn't work like if constexpr in C++ and there are no way to look on the type and act on it is very stiffing.

The only hope is that eventually someone would take Rust and produce Rust++ to make it possible to easily use Rust and C++ in one project.

Comparison to Go?

Posted Jan 10, 2025 17:57 UTC (Fri) by ralfj (subscriber, #172874) [Link] (1 responses)

> The most important part: they used LLVM which was designed to support C++ from the ground up. Means features that are different from C++ have to be “bolted on”, in may cases.
>
> They could have added things to LLVM to support something radically different, like Swift did, and they eventually even did that (when they added async, e.g.), but core was pretty much dictated by what C++ compiler can do.

Are you still talking about Rust here? This is entirely wrong.^^ It would take too long to list all the ways in which this is wrong, so just two points: Most of C++ (such as all of the template logic) is implemented in clang, not LLVM. Also, C++ compilers cannot do ownership nor borrowing, and clearly that did not stop Rust.

> I would say that Rust is much closer to modern C++ than to Ocaml or F#. Even if many ideas of Rust's standard library are borrowed from Ocaml type system is closer to C++ – simply because of how it's implemented.

The scale at which you are spreading completely unfounded falsehoods here is pretty astounding. I won't bother with a point-by-point rebuttal, I am just leaving this note here so that other readers know to take your writing with a bag of salt.

Comparison to Go?

Posted Jan 10, 2025 19:03 UTC (Fri) by khim (subscriber, #9252) [Link]

> This is entirely wrong.^^

How?

> Most of C++ (such as all of the template logic) is implemented in clang, not LLVM

Sure. But the important thing that generated code is monomorphic, that is: neither C++ nor Rust can generate a single instance of code that deals with different types (like Ada, Java, C#, Swift and most other languages that have generics can do), they are producing entirely different functions for each type (which may then be combined at later stage, but only if machine code generated is identical).

That was dictated by the clang/llvm split as far as I know.

> Also, C++ compilers cannot do ownership nor borrowing, and clearly that did not stop Rust.

And mrustc also can not do ownership and borrowing and yet it compiler Rust just fine. And if you couple it entirely decoupled Rust frontend which would check ownership and borrowing it would become entirely correct Rust compiler. Means “ownership nor borrowing” is entirely separate module from everything else.

> Are you still talking about Rust here? This is entirely wrong.^^

Well… since we are talking about role that LLVM played in design of Rust then I would rather go with the opinion of Graydon Hoare, then yours. With all due respect to your work… he was there when that design happened. And he very clearly points to LLVM as culprit for many design decisions.

That's how Rust typesystem ended up as combination of worst sides of generics (usually polymorphic with one piece of code for all supported types in most popular languages) and templates (monophonic over type and thus capable of doing various tricks with different types).

From what I understand there was hope to make it possible to have polymorphic generics, but that just have never worked properly because of LLVM limitations and team could never settle on whether they want flexibility of templates or possibility of handling “foreign” types (like most generic systems can do).

The end result is neither here nor there and thus doesn't play well with intuition of both camps.

Comparison to Go?

Posted Jan 5, 2025 17:48 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

I think you are vastly overestimating the level of knowledge required for someone to be proficient in a systems language with C++-like complexity.

> It's true that I've read a _lot_ of C++ initially because I wanted to understand HashMap, and thus hashbrown, and thus the Swiss Tables and so I ended up watching CppCon and reading the source code. So I probably could write pretty good C++ today if I wanted to (I do not want to). But that all comes after I learned Rust. I read the C++ memory ordering model after I used the Rust implementation of exactly the same model. When I first saw std::expected and std::optional I already knew Result and Option very well. When I read the paper proposing the C++ 11 move semantic I had already been using Rust's default move assignment semantic and thinking this is obvious for some time. When I read Barry Revzin's "trivial union" work I came at that being intimately familiar with the details of MaybeUninit<T> which is basically what Barry is trying to be able to do in C++.

The average professional C++ developer understands, at best, half of those things. To be more specific:

* Most C++ developers probably know that std::hash_map is O(1) in all important operations, and they probably have a loose understanding of how a hash table works in principle. They probably know what "open" and "closed" addressing mean. They probably don't know the details of a modern hash table implementation.
* Most C++ developers haven't the faintest idea what a "memory ordering model" is or how to reason about it. They probably understand that races exist and can be prevented with locking. They probably know that atomics exist, but not how to use them correctly.
* Most C++ developers have a general understanding of move semantics, but if you ask them to explain how std::forward works internally, they will not have a damn clue ("I dunno, just copy the sample code from cppreference and it'll work"). They can tell you not to write "return std::move(foo);", but they cannot tell you why (some developers know enough to vaguely gesture at NVRO, which they may wrongly refer to as "copy elision," but this is an incomplete explanation at best).

But the same also goes for Rust:

* Most Rust developers think of HashMap as a black box, much like their C++ counterparts.
* Rust developers have probably heard of Send and Sync, and are probably aware that mutable objects which will be shared between threads need to be protected by some type in std::sync, usually a lock or atomic of some kind. In practice, the idiot-proof options are Mutex and RwLock, so most developers are probably going to reach for one of those (possibly wrapped in an Arc if the borrow checker complains about it). The smart developers know that OnceLock/LazyLock exist. Of course, there are situations where you need to reach for atomics, but the expectation is that you know what you're doing before you go down that road.
* Rust's move semantics are simpler than C++'s (no moved-from objects, no rvalue references, no move/copy overload resolution, moves are memcpy, etc.), so I would expect Rust developers to have a firmer understanding of them in practice, but that also means that Rust's semantics are easier to learn, especially if you already know about C++'s semantics. Going from C++ to Rust is mostly an exercise in "stop thinking so damn hard and just write it the obvious way."
* Rust does introduce lifetime semantics, which make things harder, but only if you want to have long-lived borrows. If you internalize the rule that borrows should be ephemeral, most borrow checker errors can be boiled down to "the borrow checker thinks this borrow is not ephemeral enough." The remainder can be phrased as "you really do need this borrow to be non-ephemeral, so now you have to explain how that works to the borrow checker (or just wrap it in Rc/Arc and call it a day)." C++ developers should already be familiar with the general concept that references should not outlive the underlying object (or else your code will be littered with UAF bugs).

Comparison to Go?

Posted Jan 5, 2025 18:30 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

> if you ask them to explain how std::forward works internally, they will not have a damn clue ("I dunno, just copy the sample code from cppreference and it'll work")

I think that's the central piece: some people try to understand how things really work – and some don't think it's even worth doing.

The former build some kind of mental worldmap in their head. Even if sometimes they build wrong map that doesn't correspond to the reality – it's easy to fix it, when such difference is found.

And these mental maps are very similar in C++ and Rust. Rust simplifies many things (while simultaneously making some other more complicated… TANSTAAFL at it's best) and replaces STL with more Lisp/ML/Haskel style standard library (but if you build a mental map of how things work the actual names of the functions are not that important… there's Google, after all) – and thus Rust is perceived as “C++ done right”.

But for people of “just copy the sample code from cppreference and it'll work” parlance, who collect useful snippets of code without even trying to understand how and why code looks this and not that way… for them switch from C or C++ to Rust is really painful, because all their collections of copy-pasted code are, suddenly, worthless (or almost worthless).

But these people also have trouble embracing new paradigms that are arriving with new versions of C++!

> which they may wrongly refer to as "copy elision,"

Why “wrongly”? It's the official name for what is happening.

It may not be very appropriate because what is eliminated is not copy, but materialization of prvalue, but trying to invent your own terms doesn't help if you want to discuss things with someone, better to use terms that official documentation is using… even if they are not 100% correct.

Comparison to Go?

Posted Jan 5, 2025 19:31 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

> Why “wrongly”? It's the official name for what is happening.

You're right, I mixed NVRO up with so-called "guaranteed copy elision," which is not actually a form of copy elision at all, but falls out of the semantics of prvalues. Which just goes to show how hard it is to keep track of C++ semantics even when you're trying to understand it all properly.

Comparison to Go?

Posted Jan 5, 2025 18:26 UTC (Sun) by excors (subscriber, #95769) [Link] (4 responses)

> C++ programmers are used to templates, which are just text mangling again of course and so are duck typed, Rust won't let you do that.

This is mostly off-topic but: It's not just text mangling, once you get into metaprogramming. It's a powerful duck-typed language for writing programs that execute at compile-time, where the values in that language include ints, bools, and (crucially) C++ types. The use of C++ types as values is what makes it "meta" and distinguishes it from compile-time evaluation of regular C++ code, and also distinguishes it from the C preprocessor (which _is_ basically text mangling). The metaprogram can see all the types (including classes) in your non-meta code, and can output new types and functions for use by the non-meta code.

For example you can write something like `template<typename T> auto deref(T p) { if constexpr (std::is_pointer_v<T>) return deref(*p); else return p; }` which will convert e.g. `int***` to `int`. The metaprogram has input parameter `T` which is a C++ type; it can observe and manipulate that type in various ways (like with `is_pointer_v` to test if it's a pointer type), and in this case it outputs a C++ function that only includes the `*p` expression when it's legal to do so.

(Originally C++'s metaprogramming language was horrifically ugly, very hard to use, a pure functional language, and bore little resemblance to non-meta C++; it existed more by accident than by design. Nowadays with features like `if constexpr`, it's not always so bad, though it's often still quite bad.)

I think the closest equivalent functionality in Rust is procedural macros, but those are compile-time programs that operate over Rust tokens or ASTs, not over Rust types, so they're not really very similar. (I think no other language has anything very similar to C++ template metaprogramming, because you can get 90% of its functionality with very different designs that are 90% less insane.)

Comparison to Go?

Posted Jan 5, 2025 18:41 UTC (Sun) by khim (subscriber, #9252) [Link]

> I think no other language has anything very similar to C++ template metaprogramming, because you can get 90% of its functionality with very different designs that are 90% less insane

That's funny because problem with C++ TMP is not the fact that it has too much functionality, but the fact that it has so little. There are nothing that's “very similar to C++ template metaprogramming” not because it's so powerful, but because it's so limited. C++ is slowly moving in the direction of Zig's comptime thus, is some sense, C++ is becoming like other languages (with metaprogramming in something that looks somewhat similar to the “regular”, “main” language), not the other way around.

Comparison to Go?

Posted Jan 9, 2025 18:31 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

Hot take: Template metaprogramming is just Prolog with uglier syntax.

OK, OK, that's mostly wrong, but SFINAE is literally doing unification and backtracking, just like in Prolog.

Comparison to Go?

Posted Jan 9, 2025 18:41 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

Sure. But Rust's trait resolver is, essentially, the same thing just with a few more artificial limitations. But if you manage to confuse it enough… it works in the same way as C++ resolver.

Rust's type system and traits are to C++ type system and templates are more-or-less like TypeScript vs JavaScript: ultimately it's the exact same duck-typing, deep inside, only now with extra layer of typechecking on top.

The only substantial difference lies in the fact that with TypeScript you have an explicit “escape hatch”, but with Rust templates you need to fool the compiler to turn your generics into templates.

Comparison to Go?

Posted Jan 15, 2025 15:50 UTC (Wed) by taladar (subscriber, #68407) [Link]

The more substantial difference is that Rust doesn't routinely abuse the trait resolver in a way that produces error messages that could give small children nightmares while in C++ that is standard practice.

Correction

Posted Jan 7, 2025 21:58 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

I just had a conversation with one of those friends and they inform me that in fact they did write C++ 98 - and that I ought to have known that because 25 or so years ago I refused to help on a project which they were writing in C++

So, I apologise for the previously misleading claim and retract it.

Comparison to Go?

Posted Jan 5, 2025 15:40 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link] (8 responses)

> C is almost as wrong about types as it could possibly be.

[...]

> code which shouldn't compile does anyway because C doesn't care whether it's sensible to add (for example) a
> bool to a float or a character to a pointer

C has no character type and doesn't really have a bool type either, only as latter-day addition by people who already didn't understand the C type system (or also chose to misunderstand it intentionally). Hence, it's impossible to "add a character to a pointer" in C and equally impossible to "add a boot to a float". In C, the former is really "adding an integer to a pointer" and the latter "adding an integer to a float". The only non-composite types C knows about are integers, floating-point number and pointers.

Comparison to Go?

Posted Jan 5, 2025 23:32 UTC (Sun) by dvdeug (guest, #10998) [Link] (7 responses)

>> code which shouldn't compile does anyway because C doesn't care whether it's sensible to add (for example) a
>> bool to a float or a character to a pointer

> C has no character type

C has a type named char that is guaranteed to hold a character (modulo non-ASCII characters that K&R C didn't have to support), written in the source code as 'a'.

> Hence, it's impossible to "add a character to a pointer" in C and equally impossible to "add a boot to a float".

void f (void *p) {
float pi = 3.14;
p += 'a';
pi += 0 < 1;
}

compiles without warnings on GCC -Wall. That just proves the post you're responding to; C doesn't care whether it's sensible to add 'a' to a pointer, and arguing that 'a' is just a fancy way to write an integer in C and isn't a character is silly.

Comparison to Go?

Posted Jan 6, 2025 12:39 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link] (6 responses)

>>> code which shouldn't compile does anyway because C doesn't care whether it's sensible to add (for example) a
>> bool to a float or a character to a pointer

>> C has no character type

> C has a type named char that is guaranteed to hold a character, [...] written in the source code as 'a'.

As I wrote in the part of my text you've chosen to ignore: char is an integer type guaranteed to be large enough for the codepoint of any character in the so-called basic execution character set and 'a' is literal of type int whose value is the codepoint of the character a in the character set (encoding) that's being used, usually 65 for ASCII.

>> [...] "add a character to a pointer" in C and equally impossible to "add a boot to a float".

>>void f (void *p) {
>>float pi = 3.14;
>>p += 'a';
>>pi += 0 < 1;
>}
>
>compiles without warnings on GCC -Wall. That just proves the post you're responding to; C doesn't care whether it's >sensible to add 'a' to a pointer, and arguing that 'a' is just a fancy way to write an integer in C and isn't a character is >silly.

"Silly" is value judgement of yours which doesn't change the C language definition where 'a' is (see above) an integer literal of type int and < is defined as having a result of type int which is 1 if the relation is true and 0 otherwise.

Comparison to Go?

Posted Jan 6, 2025 17:47 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link]

ASCII codepoint of 'a' is 97 and not 65, that's 'A'.

General property of ASCII that's sometimes useful: The difference between an uppercase letter and the corresponding lowercase latter is that the 6th bit (32) is set for the latter and clear for the former.

Comparison to Go?

Posted Jan 6, 2025 18:04 UTC (Mon) by dvdeug (guest, #10998) [Link] (4 responses)

Going back to the original complaint:
> C doesn't care whether it's sensible to add (for example) a bool to a float or a character to a pointer,

The fact that the C standard calls char an integer type is why C doesn't care. It doesn't change the fact you're adding a character to a pointer, it just means that C doesn't see it that way.

You wrote
> "Silly" is value judgement of yours

Yes.

>which doesn't change the C language definition

Which is irrelevant. If you claimed that Ada generics aren't powerful, could I point out the Ada language definition says "Finally, the language provides a powerful means of parameterization of program units, called generic program units" and claim victory? If someone is arguing that C's type system is deficient because it lets you add a bool to a float, it doesn't help to point out that C just treats bools as integers.

Comparison to Go?

Posted Jan 6, 2025 19:05 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link] (3 responses)

>> C doesn't care whether it's sensible to add (for example) a bool to a float or a character to a pointer,
> The fact that the C standard calls char an integer type is why C doesn't care. It doesn't change the fact you're
> adding a character to a pointer, it just means that C doesn't see it that way.

There is no such thing as a character type in C and hence, there's no way to add a character to anything in C. That's a fact which is part of the language definition.

> If someone is arguing that C's type system is deficient because it lets you add a bool to a float, it doesn't help to
> point out that C just treats bools as integers.

Likewise, there is no such things as a bool type in C that's distinct from an integer type and hence, adding a bool to a float is also something that's impossible in C. That's also part of the C language definition. It's obviously possible to argue that the C type system should really contain concepts like character or boolean types but as a matter of fact, it doesn't. Hence, this particular criticism is disingenuous. C allows addition of numbers and due to the relative paucity of the type system, numbers are also employed to represent entities (like characters or boolean values) which have types of their own in other programming languages. But that's strictly a matter of interpretation. 97 is an integer. In certain contexts, it might represent the character a and in others, it doesn't.

Comparison to Go?

Posted Jan 6, 2025 19:34 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> There is no such thing as a character type

What is a "character type"?

Comparison to Go?

Posted Jan 6, 2025 19:46 UTC (Mon) by adobriyan (subscriber, #30858) [Link]

This whole confusion is because "character constant" token doesn't have type "char" in C.

"char" of course officially exists now: _Generic dispatches on "char" as it does on "_Bool".
Informally, BTCP recognised "char" as real type for quite some time.

Comparison to Go?

Posted Jan 7, 2025 4:41 UTC (Tue) by dvdeug (guest, #10998) [Link]

I once heard that someone claim that many older programming languages were named for what they process, like LISP for lists, FORTRAN for formulas, SNOBOL for strings, and C for characters. Jokes aside, a whole lot of early code written in C was for character processing; a C compiler, sed, ed, vi, awk, half the programs in GNU Coreutils.

> There is no such thing as a character type in C and hence, there's no way to add a character to anything in C.
...
> It's obviously possible to argue that the C type system should really contain concepts like character or boolean types but as a matter of fact, it doesn't. Hence, this particular criticism is disingenuous.

No, this particular criticism is not disingenuous. What's disingenuous is when you take a language that has built-in features to handle character data and a type designed to hold character data (named char) and character constants, where you can take a pointer p and add 'a' to it, and act like criticizing that is unreasonable because the text of the standard doesn't have a character type.

Oh, and C11 §6.2.5.15 says "The three types char, signed char, and unsigned char are collectively called the character types." I don't have access to all the C standards documents, but at least that one officially says it has character types. So not only is your claim trying to use pedantry to avoid reality, it's using pedantry that's not correct.

Cyclic references in garbage collection

Posted Jan 3, 2025 16:46 UTC (Fri) by epa (subscriber, #39769) [Link] (4 responses)

I didn’t understand how cycles are avoided in garbage collection. Surely you can have an object A in one actor which points to object B in another, and B points back to A? Granted, the objects would not be mutable, but still self-referential and creating a cycle that could not be garbage collected, even if there are no other references to these two objects.

Cyclic references in garbage collection

Posted Jan 3, 2025 17:09 UTC (Fri) by daroc (editor, #160859) [Link]

The trick is that you can create a cycle within an actor as much as you want, but you cannot send only part of a cycle to another actor unless the data is immutable, and you can't create a cycle between two actors. (With the exception of tag pointers, which are slightly magic; see below.)

So imagine that I have mutually referential objects A and B.

If I have an isolated pointer to the whole cycle, I can send the whole cycle to another actor. But A and B can't directly reference each other with isolated pointers, because then either they would immediately be garbage, or the isolated pointers wouldn't be unique.

For value pointers, the objects are immutable, so sending B to another actor doesn't cause problems — from the first actor's point of view, it just pins the cycle as a garbage collection root until the second actor is done with it.

For reference, box, and transition pointers, the objects can't be sent to another actor.

For tag pointers, the runtime pulls a little bit of magic to detect cycles at the same time it detects dead actors; you're correct that you can actually have between-actor cycles using tag pointers, and this complicates things. For the purposes of garbage collection, though, they really are just treated as collection roots, with cycles getting cleaned up by the separate system that reaps dead actors.

Cyclic references in garbage collection

Posted Jan 3, 2025 17:10 UTC (Fri) by sionescu (subscriber, #59410) [Link] (2 responses)

You can solve cross-actor referencing if each actor has its own private heap/arena and sending messages does a deep copy. That's pretty much what Erlang does.

Cyclic references in garbage collection

Posted Jan 3, 2025 17:33 UTC (Fri) by daroc (editor, #160859) [Link]

Pony's advantage over Erlang is that sending messages doesn't need a deep copy, because of the compile-time checks. So while you can make a deep copy if you want to, you don't have to if that will be detrimental to performance.

Cyclic references in garbage collection

Posted Jan 3, 2025 19:14 UTC (Fri) by kleptog (subscriber, #1183) [Link]

An interesting side effect of the Erlang approach is that in practice, most threads (actors) do not live long enough to trigger the garbage collector. When they die you can simply destroy the stack+heap of that thread since you know it's unreferenced.

Pony can't take advantage of this trick though, but their approach is novel, like a cross between Erlang and Rust.

You don't use Erlang for performance (though the JIT they're working on helps a lot), you use it because throughput scales linearly with the number of CPUs. And you can connect multiple machines into a single large cluster without changing your program.

Division by zero

Posted Jan 3, 2025 19:02 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (23 responses)

Pony defines division by zero to be zero: https://tutorial.ponylang.io/gotchas/divide-by-zero#divid...

That's an interesting choice. I'd want a trap or an abort most of the time in that situation instead of the wrong result.

Division by zero

Posted Jan 3, 2025 21:44 UTC (Fri) by Wol (subscriber, #4433) [Link] (22 responses)

I'd rather have a language that handled infinity ... although quite what happens when you get 0/0 I don't know - as I've been trying to fix at work recently - our spreadsheets didn't like the fact we had only one shift not two on several days over Christmas...

Makes debugging a pain when you need to mess about creating an unusual setup to try and replicate the problem. Oh well ...

Cheers,
Wol

Division by zero

Posted Jan 3, 2025 21:56 UTC (Fri) by dskoll (subscriber, #1630) [Link] (21 responses)

In IEEE 754 floating point, 0.0 / 0.0 gives you NaN if you don't trap it. Once an expression yields a NaN, the NaN "infects" every other operator... any operator involving a NaN returns the NaN.

But that's floating point. Most CPUs don't have a way to represent a NaN in a variable of integer type. (Nor Inf nor -Inf, though I guess you could co-opt INT_MAX and INT_MIN respectively.)

Division by zero

Posted Jan 3, 2025 22:19 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (20 responses)

Right. NaN is at least "I don't know". It may be imprecise, but it's not wrong. Zero is mathematical nonsense: imagine 60/5, 60/4, 60/3, 60/2, and 60/1: the quotients are 12, 15, 20, 30, and 60. Now you go to 60/0 and get zero? And then 60/-1 goes to -60? Weird discontinuity. I'd rather abort the program if it tries to do something nonsensical like divide an integer by zero.

Division by zero

Posted Jan 3, 2025 22:29 UTC (Fri) by dskoll (subscriber, #1630) [Link] (19 responses)

Yes, I agree. And most CPUs will raise an exception on integer divide by zero. This little C program:

int main()
{
    int x = 0;
    int y = 0;
    int z = x/y;
    return 0;
}

Does this when you run it:

$ ./test
Floating point exception
$ echo $?
136

Division by zero

Posted Jan 4, 2025 9:56 UTC (Sat) by pm215 (subscriber, #98099) [Link] (18 responses)

Well, x86 does that. But the CPU architecture you're next most likely to be running on today (arm) does not. There the division instructions are defined to produce a zero result without generating an exception if you divide by zero. I believe powerpc, mips and riscv also do not generate exceptions for integer division by zero.

Division by zero

Posted Jan 4, 2025 13:06 UTC (Sat) by pizza (subscriber, #46) [Link] (7 responses)

> But the CPU architecture you're next most likely to be running on today (arm) does not.

Historically, Arm processor didn't actually implement a hardware integer divider, so any faults/exceptions had to be triggered by the low-level software runtime library.

But starting with armv7-m and armv8, there are integer divide instructions, and they can generate divide by zero exceptions -- something I can personally attest to triggering numerous times. Here is the documentation on the UDIV/SDIV instructions on armv7-m:

https://developer.arm.com/documentation/ddi0403/d/Applica...

Meanwhile, IIRC the various Arm FPUs always supported faults/exceptions, including for divide-by-zero.

Division by zero

Posted Jan 4, 2025 18:03 UTC (Sat) by khim (subscriber, #9252) [Link] (4 responses)

> But starting with armv7-m and armv8, there are integer divide instructions

Yes.

> and they can generate divide by zero exceptions

No. Take your own link that, with one click, tells us how SDIV works on ARM 7-m. It looks like this:

if ConditionPassed() then
    EncodingSpecificOperations();
    if SInt(R[m]) == 0 then
        if IntegerZeroDivideTrappingEnabled() then
            GenerateIntegerZeroDivide();
        else
            result = 0;
    else
        result = RoundTowardsZero(SInt(R[n]) / SInt(R[m]));
    R[d] = result<31:0&rt;

Now compare that to ARM8 specification:

constant bits(datasize) operand1 = X[n, datasize];
constant bits(datasize) operand2 = X[m, datasize];
constant integer dividend = SInt(operand1);
constant integer divisor = SInt(operand2);
integer result;
if divisor == 0 then
     result = 0;
elsif (dividend < 0) == (divisor < 0) then
     result = Abs(dividend) DIV Abs(divisor); // same signs - positive result
else
     result = -(Abs(dividend) DIV Abs(divisor)); // different signs - negative result
X[d, datasize] = result<datasize-1:0>;

See that call to GenerateIntegerZeroDivide? I don't see it either.

Just why ARM decided that embedded version have to have “division by zero” exception while “big” CPUs have to silently produce zero is good question, but that's how things work currently.

P.S. Of course if you recall that RISC-V does things differently and x86, too… then question of codifying ARM-specific behavior in the language arises… but I guess it one of these “looked good at the time” ideas.

Division by zero

Posted Jan 4, 2025 18:10 UTC (Sat) by dskoll (subscriber, #1630) [Link] (1 responses)

Huh, you are right! Even though my Pi 4 is running an aarch64 kernel, userspace is 32-bit armhf and the test program raised SIGFPE. I tried the test program on a fully 64-bit Pi 4 with 64-bit userspace and it ran without complaint, assigning 0 to z.

Division by zero

Posted Jan 4, 2025 20:34 UTC (Sat) by pm215 (subscriber, #98099) [Link]

My guess is that your compiler is generating code that assumes the CPU doesn't implement the division instructions (because armhf includes v7-without-VE CPUs in its remit) and instead calls into the gcc runtime, and the runtime function is then manually raising a SIGFPE. (Possibly also the compiler figures out that it is a div by zero and generates a call to the div-by-zero runtime function.)

You can probably pass the compiler some kind of -march or -mcpu options to tell it to generate code assuming the v8 CPU you have, and then it ought to emit the udiv or sdiv inline, if you want to look at the behaviour in that situation.

Division by zero

Posted Jan 5, 2025 0:05 UTC (Sun) by pm215 (subscriber, #98099) [Link] (1 responses)

Re "codifying ARM-specific behavior in the language", it isn't necessarily that. The Pony FAQ justifies why you would want it to return *some value*, and having it return a known value rather than a random one is sensible for reproducibility. Then, you could pick anything, so why *not* 0? It makes no difference to the "branch away if divisor is zero" code you need to generate (and if you're really into shaving cycles then 0 has the advantage that you know for certain you already have it in the input value so you can copy it to the output and don't need to emit code to generate a fresh constant value).

Apparently various theorem provers also define their division this way:
https://xenaproject.wordpress.com/2020/07/05/division-by-...
and https://www.ponylang.io/blog/2017/05/an-early-history-of-... suggests that the inventory of Pony has some background in that kind of academic type theory/theorem proving area, so it seems plausible that the motivation for choosing 0 might been influenced by existing languages like that. Early 2010s seems a bit early for it to be very likely that a language designer was much influenced by fine details of codegen for Arm.

Division by zero

Posted Jan 6, 2025 0:38 UTC (Mon) by Heretic_Blacksheep (guest, #169992) [Link]

Mathematical correctness doesn't matter here because division by zero is undefined. You can literally make it anything you want - and I've seen math professors do so as an esoteric joke. However, programmers do not like non-deterministic error states. Pony makes the outcome deterministic, regardless of hardware traps that may or may not exist, allowing for a simple error check state across all architectures which the programmer can define as fatal or non-fatal depending on their program logic then handle it accordingly.

Given the alternatives, I'd much rather have a 'mathematically incorrect' deterministic result that I can take to the bank rather than depending on disparate architectural dependent error states that are really just as mathematically arbitrary as assigning X/0 == 0 but a lot more messy to deal with.

Division by zero

Posted Jan 4, 2025 20:19 UTC (Sat) by pm215 (subscriber, #98099) [Link]

Integer divide in A-profile comes in with the Cortex-A15 and A7, in v7VE, so some v7A cores have it, just not all. V7R requires it for the Thumb insn set but not the Arm one (!)

And yeah, as noted in the sibling comments M-profile has configurable trapping of integer division by zero, but A profile does not. (M profile diverges from A in various more or less obvious ways.) R profile also permits configuring trapping on div by zero. A profile never traps.

The v7A/R Arm ARM has a section that describes the various options:

https://developer.arm.com/documentation/ddi0406/c/Applica...

v8 got to clean this up by just having them be always present.

Division by zero

Posted Jan 4, 2025 20:54 UTC (Sat) by pm215 (subscriber, #98099) [Link]

By the by, on the floating point side of the house Arm has always supported setting the appropriate floating point status flag bit for fp division by zero, since this is a requirement of IEEE 754. Support for trapping (i.e. generating a CPU exception into the kernel instead of setting the status flag bit) is an implementation defined choice, though -- not many implementations choose to implement it (though I'm aware of at least one which does). As far as I know the only way to detect trapping support is to write a 1 to the FPCR/FPSCR "enable traps for this exception" bit and see if it reads back as zero or one -- if the implementation doesn't implement trapping then the bit will be RAZ/WI.

Division by zero

Posted Jan 4, 2025 13:22 UTC (Sat) by dskoll (subscriber, #1630) [Link] (6 responses)

I get the same SIGFPE on a Raspberry Pi 4 with an aarch64 kernel.

Division by zero

Posted Jan 4, 2025 18:09 UTC (Sat) by khim (subscriber, #9252) [Link] (5 responses)

SIGFPE sounds suspiciously like result of floating-point operation. We are talking about integer division here.

Division by zero

Posted Jan 4, 2025 18:12 UTC (Sat) by dskoll (subscriber, #1630) [Link] (2 responses)

Yes, I know. Nevertheless, SIGFPE is the signal that gets raised.

$ cat /tmp/test.c
#include <stdio.h>
int main()
{
    int x = 0;
    int y = 0;
    int z = x/y;
    printf("z = %d\n", z);
    return 0;
}

$ strace /tmp/test
[... bunch of stuff elided ...]
--- SIGFPE {si_signo=SIGFPE, si_code=FPE_INTDIV, si_addr=0x55c090d59153} ---
+++ killed by SIGFPE +++
Floating point exception

Division by zero

Posted Jan 4, 2025 22:56 UTC (Sat) by khim (subscriber, #9252) [Link] (1 responses)

How do you compile that? GCC is smart enough to recognize UB and produce appropriate brk if optimizations are enabled, but that's not related to what CPU is doing. And unoptimized version calls function that can check the divisior.

Try to invoke sdiv directly, then CPU should do what it does. At least for me it produces zero, as CPU manual promised.

Division by zero

Posted Jan 4, 2025 23:18 UTC (Sat) by dskoll (subscriber, #1630) [Link]

I compiled in both cases with make test which simply invoked gcc with no optimization. I checked the assembly output and you are right. On the armhf architecture, it looks like gcc calls into a library function:

        bl      __aeabi_idiv

but on the aarch64 architecture, it calls an assembly instruction:

        sdiv    w0, w1, w0

Division by zero

Posted Jan 4, 2025 19:15 UTC (Sat) by excors (subscriber, #95769) [Link] (1 responses)

glibc says: (https://sourceware.org/glibc/manual/2.40/html_node/Progra...)

> The SIGFPE signal reports a fatal arithmetic error. Although the name is derived from “floating-point exception”, this signal actually covers all arithmetic errors, including division by zero and overflow.

(It notes the integer overflow exception is "impossible in a C program unless you enable overflow trapping in a hardware-specific fashion".)

Division by zero

Posted Jan 4, 2025 22:58 UTC (Sat) by khim (subscriber, #9252) [Link]

I'm surprised not by SIGFPE per see, but by the fact that SDIV suddenly started producing exceptions. Raspberri Pi 4 uses ARM Cortex-A72 which uses ARMv8-A and on ARMv8-A result should be zero, not exception.

If you avoid UB and functions that do manual checks, at least.

Division by zero

Posted Jan 6, 2025 15:52 UTC (Mon) by paulj (subscriber, #341) [Link] (2 responses)

Oh, interesting. Is there any way to enable signal or some exception bit to detect this? (Like FP has, per standard)

Division by zero

Posted Jan 6, 2025 17:39 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

Not for the integer UDIV and SDIV instructions. You can see in the "Operation" section (which uses Arm Pseudocode to describe the behaviour in abstract terms) that the instruction unconditionally outputs a 0 result if the input divisor is 0.

Division by zero

Posted Jan 7, 2025 11:01 UTC (Tue) by paulj (subscriber, #341) [Link]

Hmm, ouch.

GCC seems to have some useful options to help catch int-div-0 in testing, including for its analyzer and sanitizer, and also a -mcheck-zero-division - which is enabled by default for -O0 and -Og.

First time I hear of Erlang as actor based

Posted Jan 3, 2025 19:33 UTC (Fri) by martin.langhoff (subscriber, #61417) [Link] (6 responses)

I've programmed in Erlang, and I've seen several actor based software designs, in other languages. I had never heard of Erlang being actor based.

Erlang's data is immutable, and all data shared across threads/processes is copied (and all data shared within a process is immutable anyway) so races and GC are not an issue.

I guess what I'm trying to say is: sounds like Pony has an interesting approach to parallel processing, but I can't quite 'get' what it is from reading the article. The actor model might be a distraction here, not clear to me how it connects with the problem at hand.

First time I hear of Erlang as actor based

Posted Jan 4, 2025 16:25 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

> I guess what I'm trying to say is: sounds like Pony has an interesting approach to parallel processing, but I can't quite 'get' what it is from reading the article. The actor model might be a distraction here, not clear to me how it connects with the problem at hand.

As someone who has never used Erlang: After reading some of Pony's basic entry-level documentation, I do not understand how I could conceptualize this language *without* using the actor model to do so. Literally all of its parallel machinery is defined in terms of the actor model (at least according to the documentation, anyway). Maybe there is another way of thinking about it, but I don't see what that way looks like.

First time I hear of Erlang as actor based

Posted Jan 5, 2025 12:41 UTC (Sun) by kleptog (subscriber, #1183) [Link] (3 responses)

I think the issue is that while Erlang lends itself to the Actor model with the way to uses processes and messages, it was never the focus. On the other hand, Elixir takes the Actor model and runs with it. It has a GenServer which provides a way to build Actors that are reliable and scalable. [1] And then build a whole supporting libraries using it (e.g. Phoenix).

So the comparison of Pony with Erlang is a bit off because Erlang was not built with actors in mind, even if it can be quite effective in implementing them. In a sense, Pony goes full the other way: (AIUI) everything is an actor, while in most Elixir programs you'll have a bunch of actors for handling requests and database connections, but most of the code is just run-of-the-mill supporting code.

Making actor a first-class type in a programming language is definitely novel.

[1] https://underjord.io/unpacking-elixir-the-actor-model.html

First time I hear of Erlang as actor based

Posted Jan 5, 2025 17:08 UTC (Sun) by martin.langhoff (subscriber, #61417) [Link] (1 responses)

I'm glad the community that likes Actor model can apply it in Elixir. I still don't see any indication that Elixir was designed with the Actor model in mind.

And that's fine. I was for a moment scratching my head, thinking I had missed some Elixir fundamental during the years I used it. I don't think I did.

First time I hear of Erlang as actor based

Posted Jan 6, 2025 8:18 UTC (Mon) by kleptog (subscriber, #1183) [Link]

I think it's a semantic discussion really.

Elixir was started by a Ruby developer who found the Ruby VM too limiting, saw the Erlang VM and figured they could combine the Ruby syntax with Erlang/OTP. The ability for Elixir to support the Actor model so well is a property of the VM, not the language. Does that mean Elixir was designed for the Actor model, or is it just happy chance that it works so well. Does it matter?

Elixir has definitely made Erlang/OTP much more accessible because Erlang's native language is Prolog-like which is, umm, an acquired taste and not commonly taught. Ruby syntax is not C or Python, but close enough to be relatively easy to follow for people from those languages.

[1] https://www.freshcodeit.com/blog/what-is-elixir-and-why-d...

First time I hear of Erlang as actor based

Posted Jan 7, 2025 17:01 UTC (Tue) by chris.sykes (subscriber, #54374) [Link]

Just for the record: Elixir's GenServer module is the equivalent of the gen_server behaviour in Erlang, not an Elixir invention.

First time I hear of Erlang as actor based

Posted Jan 6, 2025 17:00 UTC (Mon) by paulj (subscriber, #341) [Link]

The word "actor" doesn't appear in Armstrong's thesis, but Erlang clearly is an actor based language. He calls it a "Concurrency Orientated PL" (COPL) which must:

1. "... [S]upport processes. A process can be thought of as a self-contained virtual machine"
2. "Several processes operating on the same machine must be strongly
isolated. A fault in one processe should not adversely effect another
process, unless such interaction is explicitly programmed."
..
4. "There should be no shared state between processes. Processes interact by sending messages."

Clearly the actor model.

Pony influenced "inko"

Posted Jan 6, 2025 12:58 UTC (Mon) by evomassiny (subscriber, #161550) [Link]

> The language is not likely to overtake other more popular programming languages, but its ideas could be useful for other languages or frameworks struggling with concurrent data access.

I believe that the (experimental) inko [1] programing language was inspired by pony's design,
it also features a special type for pointer to isolated data, and the ability to send it to an actor without copy.

I've toyed a bit with inko and I like it very much, if you know a bit of rust, it feels quite ergonomic: it has algebraic data type, generics, and a similar model of ownership (albeit relaxed).

It also compiles to binary (using LLVM).

[1]: https://inko-lang.org/

Actor based development

Posted Jan 6, 2025 16:13 UTC (Mon) by nsheed (subscriber, #5151) [Link]

I spent quite a few years building enterprise integrations using proprietary tooling that followed an actor based pattern. Coming from a procedural background the mental gear crunching that ensued for the first few months as I tried to implement solutions my way was painful to go through but once the penny dropped and your perspective shifts you never quite seem to see a problem/solution design the same way again.

Chromium sequence

Posted Jan 7, 2025 5:56 UTC (Tue) by ibukanov (subscriber, #3942) [Link]

Chromium C++ codebase extensively uses the notion of the sequences that is similar to actors in Pony. As in Pony a sequence has a message queue and messages are always processed strictly sequentially. Thus the code does not need to synchronize even if different massages may be dispatched on different hardware threads.

The communication between sequences is done using one-way asynchronous message passing. However Chromium includes a lot of helpers to post a message to another sequence and then post back the result.

Chromium encourages to send messages by values using move semantic. But one can also send reference-counted things. Of cause, as this is C++, there are provisions to post messages with the raw pointers and references, but at least such code must be explicitly annotated and comments must explain how is this safe.

Overall I have found it was easy to follow and reason about such code. In fact the explicit notion of the event queue and posting messages processed by class methods made it very explicit about the code intentions. I definitely prefer this actor style over Golang goroutines and channels or even async/await in Rust or JavaScript.