Zig explores structured concurrency
Version 0.16.0 of the Zig programming language was recently announced, and with it an expanded version of the new Io interface that we covered in December. The new interface is based on an idea called structured concurrency that makes writing correct concurrent applications easier. Zig's implementation of the idea is more explicit and verbose than other languages, however, which could offer an opportunity to explore the consequences of different designs.
Asynchronous background
Rather than special syntax to support asynchronous functions, Zig now has a new interface (i.e., a structure full of function pointers) that can be used to invoke functions asynchronously or concurrently. That distinction is important: the provided async() function invokes a callback that can be run simultaneously in another thread, but the concurrent() function invokes a callback that must be run in another thread. Like Zig's existing allocator API, an instance of the Io interface is passed around to functions that need to perform I/O operations, including introducing asynchronicity.
Both async() and concurrent() return a "future" (an object of type Future). Those objects have an await() method that waits for them to complete, and a cancel() method that cancels the future if possible. The methods are idempotent — it is legal to call await() multiple times, or to call cancel() on a future that has already been waited on. Similarly, both methods return the value of the future if it has already finished. This design makes it relatively simple to clean up any allocations or other resources associated with a future, because it is always permissible to cancel a future and clean up whatever the cancel() function returns, regardless of the future's state.
The documentation gives an example of what this could look like in a real program. The example asynchronously invokes a function foo() that returns a pointer to some kind of resource once the task is finished. If the task is canceled before it can finish, the return value will be null, which the example checks for using the fact that if statements in Zig can directly test whether a pointer is null. The defer statement ensures that the cleanup is run when the current function exits.
var foo_future = io.async(foo, .{foo_args, ...});
defer if (foo_future.cancel(io)) |resource| resource.free() else |_| {}
When the function containing this code exits, the future will be canceled, which does nothing if it was already complete. Any returned resources are cleaned up. Compare this to Rust's version of the same concept:
let foo_future = foo(foo_args, ...);
That Rust code does the same thing: when the function ends, the future will be canceled and any returned resources will be cleaned up. The Rust version has less boilerplate, although that does also mean that the cleanup code has to be generic, and can't be customized for a particular situation (or perform its own asynchronous I/O). The implicit design has given Rust some sharp edges related to canceling asynchronous tasks. The ability to customize cleanup code gives Zig an easier way to encapsulate a particular pattern for asynchronous code into a library.
Structured concurrency
The basic design described above is no different than previously reported. What is new is a set of standard-library modules for handling asynchronous operations and documentation on how to use this new interface to avoid some common problems with asynchronous code in other languages. The Io.Group interface, for example, allows managing a batch of asynchronous tasks that all share the same lifetime. The tasks added to a group are all canceled or waited on at once. The Io.Select interface builds on top of that to make it possible to start many operations in parallel and wait for the first to finish, canceling the others.
This approach is not exactly new. Erlang has offered various ways to structure asynchronous and concurrent operations since the 1970s, for example, although its ideas have not penetrated the mainstream. More recently, programming-language designers have begun experimenting with the idea of structured concurrency as a way to handle the complexity of asynchronous operations. Structured concurrency was first outlined by Martin Sústrik in 2016. It suggests applying the same discipline to concurrent processes that Edsger Dijkstra's structured programming did to the use of goto statements: restrict the shape of programs such that the control flow is clear from examining the structure of the source code. The idea was later expanded on by the Trio Python library, and eventually adopted by the Swift and Kotlin programming languages as part of their standard libraries. Other languages, such as Rust, have third-party libraries inspired by the same idea.
Each of these takes on the idea implements a slightly different interface, but the basic idea of a "group" of tasks that are all awaited as one is present in all of the implementations — though Trio calls it a "nursery" and Kotlin calls it a "coroutine scope". This is a relatively straightforward notion, but it's easy to see how it makes the relationship between the lifetime of asynchronous tasks simpler to reason about. Other parts of the designs have less in common. Task cancellation, for example, is something that every language handles differently.
Zig, as explained above, handles task cancellation explicitly, with the cancel() method. Kotlin does the same, except that it propagates cancellation through an entire tree of tasks, instead of requiring the programmer to decide explicitly how cancellation should propagate. Rust's task cancellation is implicit: a task is effectively canceled as soon as no other code is interested in polling for its result. Trio relies on Python's support for exceptions; each task periodically checks whether it has been canceled, and if so, it raises a Cancelled exception. Swift uses a system much like Trio's, except that the type system ensures that CancellationError exceptions can only come from specific, annotated points.
To some extent, this kind of variation is inevitable, since the different languages have their own design aesthetics and priorities. But cancellation is such an important component of writing correct asynchronous programs that the inconsistencies between languages prevent them from really agreeing on a common definition for "structured concurrency". For example, if one task in a group runs into an error, should the entire group be canceled? In Trio, the answer is "yes", although the library allows programmers to separate out "cancellation scopes" from nurseries to exercise finer control. Swift and Kotlin don't offer that affordance, but they do differ in how exceptions are propagated through a tree of asynchronous tasks. Zig leaves it all up to the programmer. When every implementation disagrees about how to handle cancellations of tasks in a group, it makes it hard to claim that structured concurrency is really the same between languages.
Structured programming was a wild success, to put it mildly. Almost every programming language has a concept of an if statement or a while loop, and they nearly all work the same way. Some languages make these constructs into expressions, but the basic control-flow remains the same. For loops are more complicated, and the details frequently vary from language to language. The languages that lack these constructs (for example, Prolog) are typically outliers in other ways as well. In a real way, structured programming successfully identified a core set of control-flow primitives that were so useful and versatile that they were adopted across the world of programming.
Structured concurrency does not, so far, match up. It's clearly a useful idea, given that the number of implementations has been slowly growing over the past decade. And yet those implementations agree on few details beyond the core idea of a group of asynchronous tasks. Designs and intuitions built up with Zig's implementation do not transfer cleanly to Swift, or to Kotlin, and vice versa.
Zig's take on the idea is more verbose and explicit than the other languages, which is both a blessing and a curse. On the one hand, Zig programmers will have to deal with more boilerplate. On the other hand, Zig's explicitness makes it possible to experiment with details such as how results are stored, how cancellation is handled, and how these things are packaged up into a usable interface. Zig is a rapidly evolving language, and the developers have no problem with changing the standard-library APIs when better ideas come along — at least until version 1.0.0 arrives, which still appears to be some ways off. Perhaps Zig's adoption of structured concurrency in the standard library could provide an interesting playground to experiment with variations on the design. In the remote chance that there is a design as timeless and universal as the while loop waiting to be found, the investigation would be well worth it.
