JDK 21 released

[Posted September 19, 2023 by corbet]

JDK 21, the reference implementation of the Java 21 language specification, has been released. "This release includes fifteen JEPs [1], including the final versions of Record Patterns (440), Pattern Matching for switch (441), and Virtual Threads (444)".

JDK 21 released

Posted Sep 20, 2023 2:07 UTC (Wed) by calumapplepie (guest, #143655) [Link] (23 responses)

Virtual Threads is big; it gives Java a modern concurrency tool

JDK 21 released

Posted Sep 20, 2023 7:55 UTC (Wed) by vasvir (subscriber, #92389) [Link] (21 responses)

I agree but I just can't seem to stop having a weird feeling of a deja vu conundrum.

In several languages, even in java there were green threads pre 1.2. Do you remember the M:N discussions in early 200x?

Now looks we have more variants everywhere: system threads, green threads, fibers, co-routines and even async models of execution (js).

JDK 21 released

Posted Sep 20, 2023 12:45 UTC (Wed) by paulj (subscriber, #341) [Link] (2 responses)

It is funny indeed. I'm /very/ sure there were many debates inside Sun Microsystems even on the best threading model for Solaris, motivated in great part by the JVM. I think Solaris even did have an M:N thread model for a while, with M threads in libc mapped onto N OS threads. It was complex and problematic, and eventually removed, cause it was just simpler and more efficient (and less buggy!) to just have /1/ thread management system (scheduler, etc.) in the OS.

JDK 21 released

Posted Sep 20, 2023 14:13 UTC (Wed) by nim-nim (subscriber, #34454) [Link] (1 responses)

> I'm /very/ sure there were many debates inside Sun Microsystems even
> on the best threading model for Solaris, motivated in great part by the JVM.

I’m quite sure there were many such debates, and I’m quite sure they were not motivated by the JVM.

There was a lot of bad blood and turf wars between Solaris engineers and Java engineers at SUN, that explain why “write once run everywhere” quickly degenerated into “pathologically unable to integrate with any system service, especially Unix-side”.

There is no way in hell Solaris engineers would have let Java engineers define the way the Solaris threading model should look like, and there is no way in hell Java engineers would have felt constrained by the choices Solaris made.

From the Java side of the equation Solaris was a more and more marginalised OS and they had no intention of sinking with it, placing their bets on Windows and then (with a little IBM/BSD prodding) on Linux. Plus there was the great hope of eventually rewriting everything, OS included, the right way (in Java).

From the Solaris side of the equation Java was this thing that made so many wrong choices (from an OS and performance point of view) and dared dictate how the OS should behave. Plus they were traitors that helped move software to competitor OSes.

And SUN management failed to articulate a vision that would have made those bright minds achieve success toguether.

JDK 21 released

Posted Sep 20, 2023 14:21 UTC (Wed) by paulj (subscriber, #341) [Link]

Solaris engineers I'm sure wouldn't have let JVM decide the Solaris threading model. That's not at all at odds with Solaris engineers taking into account how /they/ thought users like JVM could be best served by the threading support in Solaris.

Looking it up, the M:N threading was in the original thread library (Solaris 2.5 or 2.6?) Which was deprecated by Solaris 9 odd by the T2 1:1 library, and removed for Solaris 10.

JDK 21 released

Posted Sep 20, 2023 13:28 UTC (Wed) by gus3 (guest, #61103) [Link]

"You can do this in a number of ways. IBM chose to do all of them. Why do you find that funny?" -- D. Taylor, Computer Science 350, University of Washington

(from the fortunes database)

JDK 21 released

Posted Sep 20, 2023 13:32 UTC (Wed) by MarcB (guest, #101804) [Link] (15 responses)

It think there is some disillusionment with async APIs and async applications can quickly become hard to understand or debug.
One thread per request (or connection) seems to be a much simpler model, but it can have too much overhead on a 1:1 implementation.

JDK 21 released

Posted Sep 20, 2023 14:29 UTC (Wed) by paulj (subscriber, #341) [Link] (13 responses)

At best, it's swapping state overhead for compute overheads. I.e., if the OS thread carries more state than is needed, then you take the compute hit of syncing the subset of state required in user-space to/from the running thread as and when your userspace scheduler swaps lwts on the running thread.

There's issues of preemption, blocking, etc.

I wonder what kind of state a user-space lwt scheduler would /not/ need that the OS does, that gives wins here?

Intuitively, a problem-domain specific work-task:OS-thread multi-plexer seems like it could give scalability wins, but I'm less convinced about M:N lwt:OS threads - if that were such an easy win, why did Linux and Solaris go for 1:1 after much deliberation over and (in case of Solaris) experience with M:N models?

JDK 21 released

Posted Sep 20, 2023 16:07 UTC (Wed) by farnz (subscriber, #17727) [Link] (3 responses)

The only bit of state that the kernel scheduler must deal with, but a user scheduler can potentially ignore, is the CPU context (registers etc).

However, back when threads first became a significant part of OSes, kernels had a lot of state associated with each thread. This made M:N models look very attractive, since the user scheduler was being built up with the bare minimum of state, not being cut down from the process scheduler (where the cost of switching MMU setups was high enough to hide the cost of handling excess state).

JDK 21 released

Posted Sep 21, 2023 10:19 UTC (Thu) by paulj (subscriber, #341) [Link] (2 responses)

struct task_struct on Linux is actually huge, with the CPU state in a smaller (but still largish) struct thread_struct embedded at the end. Lot of the stuff in task_struct seems related to debugging systems - in kernel tracing mechanisms, and also user-space ptrace. Then all kinds of book-keeping and references needed for security and resource-allocation/separation stuff.

A good chunk of the kernel sub-system and book-keeping could be saved by LWTs I guess. But... then you lose stuff like... NUMA balancing, separation between threads (e.g. resources via cgroups, security via LSM). However, you could argue: Who on earth would design a process that relied on different traditional threads within the process having different resource or security contexts?

I guess the argument could be made that the common case of traditional threads (i.e. shared memory, shared resources, shared everything bar a few contexts like CPU regs, stack) should be handled specially and made much lighter-weight in the kernel. I.e. a thread should be able to run with just the stuff from thread_struct - it shouldn't need a whole task_struct for each thread?

In the context of this article, Java and it's M:N lwt's, it is course not trying to present a full Linux or POSIX thread interface to user-space, so that's a much "easier" job.

JDK 21 released

Posted Sep 21, 2023 11:08 UTC (Thu) by MarcB (guest, #101804) [Link] (1 responses)

> A good chunk of the kernel sub-system and book-keeping could be saved by LWTs I guess. But... then you lose stuff > like... NUMA balancing, separation between threads (e.g. resources via cgroups, security via LSM). However, you
> could argue: Who on earth would design a process that relied on different traditional threads within the process
> having different resource or security contexts?

Exactly, the idea here explicitly is not to replace "real" threads. It is intended as an alternative to asynchronous APIs that would all be used from a single OS thread anyway.

The primary goal, according to https://openjdk.org/jeps/444, is to "Enable server applications written in the simple thread-per-request style to scale with near-optimal hardware utilization."

JDK 21 released

Posted Sep 21, 2023 13:14 UTC (Thu) by paulj (subscriber, #341) [Link]

Yeah, could make sense for Java.

The other interesting question here is to ask whether the kernel should have much lighter threads. Full tasks come with an _immense_ amount of baggage, for separation/security/etc. It'd be nice to have in-kernel light-weight threads, that just do the minimum needed for share-everything-bar-CPU-and-stack with POSIX thread APIs.

JDK 21 released

Posted Sep 20, 2023 21:12 UTC (Wed) by dcoutts (guest, #5387) [Link] (1 responses)

> I wonder what kind of state a user-space lwt scheduler would /not/ need that the OS does, that gives wins here?

There are examples out there. Notably, GHC (the primary Haskell compiler) supports light-weight pre-emptive threads which are much cheaper than OS threads, both in memory and context switching time. It has a simple thread scheduler in the RTS. I'm not sure exactly what state it saves vs an OS thread mainly because I'm not sure exactly what state a Linux thread carries, but a GHC thread is just a small struct and a small initial stack (less than 4k).

This means one can get great performance from the simple classic server design of one thread per client. 10k or 100k such threads is perfectly reasonable.

The main difficulty with libc M:N schedulers is pre-emption: usually it being expensive or complex or both. There's quite a bit of academic research that looks at solutions involving compiler support for inserting cheaper pre-emption points. This is the approach that GHC takes too.

For example: https://dl.acm.org/doi/abs/10.1145/2400682.2400695

And also advocated by: https://www.usenix.org/legacy/events/hotos03/tech/full_pa...

JDK 21 released

Posted Sep 21, 2023 1:51 UTC (Thu) by wahern (subscriber, #37304) [Link]

> The main difficulty with libc M:N schedulers is pre-emption: usually it being expensive or complex or both.

Relatedly, O(1) event polling APIs like kqueue, epoll, and Solaris Ports[1] didn't exist back when M:N threading was being debated. Those APIs don't solve pre-emption, but without such APIs the ability to spawn hundreds or thousands of threads in a network server didn't make any sense--a scheduler relying on select/poll had little chance of "C10K" scaling.[2][3] Such O(1) event interfaces were a necessary, if crude, facility for user land scheduling to be meaningfully competitive with 1:1 threads for any real-world work loads. That fact that they composed well (i.e. they're built around a special file descriptor, rather than some global facility or state) also made it easier for user land developers, including language implementors, to experiment.

NetBSD, like Solaris before it, did end up with scheduler activations to support it's M:N threading model, but I assume it was too little, too late. And AFAIU its event reporting was based on signals (similar to SIGIO/SIGPOLL), which effectively meant it was an interface only NetBSD system developers could iterate on and make use of. (I'm also not sure what types of events it reported. Did it even resolve the network connection scaling problem? Was Solaris' /dev/poll interface an offshoot of its scheduler activation framework?)

[1] Solaris 7 had /dev/poll, but M:N threading predated it. /dev/poll wasn't a public API until Solaris 8, yet by Solaris 9 M:N threading was already deprecated.

[2] SIGIO/SIGPOLL was a thing, but POSIX signal semantics are simply too cumbersome. Interrupts vs polling for software interfaces was and remains an entire debate until itself, ongoing since the 1970s (1960s?).

[3] poll hinting, where the kernel optimistically installed persistent event watchers, also existed, at least experimentally (see https://web.archive.org/web/20020226200329/http://www.hum...), but it was more of a band-aid and marginal fix. I think some BSDs effectively ended up implementing this later as part of refactoring their VFS layer and poll/select around the internal kevent (kqueue) system.

JDK 21 released

Posted Sep 21, 2023 5:35 UTC (Thu) by joib (subscriber, #8541) [Link] (6 responses)

Yes, it seems that "full" POSIX C threads API using a M:N model bogs down into issues with preemption, blocking syscalls, etc., that they then try to solve with scheduler activations and whatnot. And in the end it turned out a simple 1:1 model was the better one.

But it seems M:N threading is more successful in a runtime for higher level languages, where you can convert blocking I/O into epoll/io_uring/etc. behind the scenes, you can do things like split stacks or movable stacks, and maybe you can make do with co-operative multithreading in the user-level scheduler rather than supporting full preemption, etc.

JDK 21 released

Posted Sep 21, 2023 12:42 UTC (Thu) by ms-tg (subscriber, #89231) [Link] (5 responses)

> But it seems M:N threading is more successful in a runtime for higher level languages, where you can convert blocking I/O into epoll/io_uring/etc. behind the scenes, you can do things like split stacks or movable stacks, and maybe you can make do with co-operative multithreading in the user-level scheduler rather than supporting full preemption, etc.

I’m curious how this will go in Java - do they plan to instrument *every* existing IO API with inherently blocking semantics as you describe, so that transparently and with no code changes, the “blocking” IO can occur at runtime as a paused virtual thread which transparently wakes up when the IO is ready?

In the Ruby world there were many evolutions of this model, now there are IO hooks in a core Fiber Scheduler interface: https://docs.ruby-lang.org/en/3.2/Fiber/Scheduler.html

My understanding is that in Rust, Python, Typescript and Javascript, no such unification of blocking code semantics with non-blocking IO via lightweight virtual threads is currently offered - meaning pretty much every IO interface needs to be duplicated in a way that signals async readiness in order to power async/await or straight callback code patterns, is that accurate?

From the Ruby world it felt

JDK 21 released

Posted Sep 23, 2023 6:03 UTC (Sat) by znix (subscriber, #159961) [Link] (3 responses)

> the “blocking” IO can occur at runtime as a paused virtual thread which transparently wakes up when the IO is ready?

Yep, this is how it works. It's supposed to be plug-and-play, with no code changes required.

Java has had a pretty powerful IO API - New I/O or 'NIO' - that's supported both synchronous and asynchronous modes for ages, across both network and file IO.

My understanding is that most of the other IO APIs have been slowly rewritten in Java (eg by JEP 353) to internally use NIO, so that adding Loom support to NIO makes all the other APIs also support Loom.

This works because FFI is relatively rarely used in Java, so the problem of a native function in a library making a blocking syscall is a relatively small one - there's a few database APIs that do this, but AFAIK they're mostly being rewritten in pure Java.

JDK 21 released

Posted Sep 23, 2023 21:09 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> Java has had a pretty powerful IO API - New I/O or 'NIO' - that's supported both synchronous and asynchronous modes for ages, across both network and file IO.

NIO has been present since 2002 or so, and it was almost completely useless. Pretty much nothing in the Java ecosystem supported it. It also turned out to be kinda useless for lightweight threads, so that the NIO core had to be rewritten to support them.

Also, NIO doesn't support async file operations on Linux (I haven't checked Solaris or Windows).

JDK 21 released

Posted Sep 23, 2023 23:28 UTC (Sat) by znix (subscriber, #159961) [Link] (1 responses)

> It also turned out to be kinda useless for lightweight threads, so that the NIO core had to be rewritten to support them.

Oh, interesting! I didn't realise that.

> Also, NIO doesn't support async file operations on Linux (I haven't checked Solaris or Windows).

I thought it did on Solaris, but I may very well be mistaken.

JDK 21 released

Posted Sep 25, 2023 3:38 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

You may have been thinking of the NIO.2 additions, which, unlike the intial release of NIO, do support async IO, and are quite powerful.

So technically, your original statement is correct, NIO does support a powerful API and async I/O, but it didn't at first.. once Java 17 rolled around, they added a bunch of functionality to nio, which, lumped together, was referred to as NIO.2 at the time.

Nowadays, everyone just calls it NIO.

JDK 21 released

Posted Sep 23, 2023 20:58 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

> I’m curious how this will go in Java - do they plan to instrument *every* existing IO API

Basically. Java doesn't have a lot of IO interfaces, so it's not such a huge task. They way JDK handles this is similar to Go, each time a potentially blocking operation happens, the virtual thread is "pinned" to the system thread, so that the Java lightweight thread scheduler knows not to wait for it to become available.

For example, here's the code that handles file reading: https://github.com/openjdk/jdk/blob/master/src/java.base/... This basically means that any file IO will still require thread-per-operation.

Network IO is special, so it has hooks into the lightweight scheduler. For example, network blocking reads ultimately end up here: https://github.com/openjdk/jdk/blob/a2391a92cd09630cc3c46... The code transparently yields to the scheduler in case of lightweight threads, or just does a blocking wait if it's started from a real thread.

One bad thing is proliferation of special-casing. For example, JDK developers really wanted to support thread cancellation. But this means that threads might leak network connections, so they are automatically closed if this happens: https://github.com/openjdk/jdk/blob/a2391a92cd09630cc3c46... This already leads to some problems where the code actually wants to handle interrupts.

JDK 21 released

Posted Sep 25, 2023 2:47 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

Yes. That was explicitly mentioned in the JEP:

https://openjdk.org/jeps/444

In the asynchronous style, each stage of a request might execute on a different thread, and every thread runs stages belonging to different requests in an interleaved fashion. This has deep implications for understanding program behavior: Stack traces provide no usable context, debuggers cannot step through request-handling logic, and profilers cannot associate an operation's cost with its caller. Composing lambda expressions is manageable when using Java's stream API to process data in a short pipeline but problematic when all of the request-handling code in an application must be written in this way. This programming style is at odds with the Java Platform because the application's unit of concurrency — the asynchronous pipeline — is no longer the platform's unit of concurrency"

JDK 21 released

Posted Sep 25, 2023 2:59 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

Yeah, the virtual threads JEP mentions this, but calls out the previous attempt wasn't just M:N, it was M:1. So the performance was never going to get there..

"User-mode threads even featured as so-called "green threads" in early versions of Java, when OS threads were not yet mature and widespread. However, Java's green threads all shared one OS thread (M:1 scheduling) and were eventually outperformed by platform threads, implemented as wrappers for OS threads (1:1 scheduling)."

https://openjdk.org/jeps/444

JDK 21 released

Posted Sep 25, 2023 3:24 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

Not terribly modern, it's been in and out of languages (including Java), libraries, and operating systems for a long time now.

But it is a useful addition, particularly to stave off the various problems that come with "async" libraries in Java (in quotes, because the main implementation of the Flow API - reactive - is just an awkward interface to a Netty event loop on top of a thread pool). I've been stuck using those for a variety of reasons for a bit, and the problems called out in the JEP are very real (and just the start).

In addition, they are going to get a lot more useful when paired with Structured Concurrency, which is still in preview:

https://docs.oracle.com/en/java/javase/21/core/structured...

JDK 21 released

Posted Sep 20, 2023 7:38 UTC (Wed) by flussence (guest, #85566) [Link]

I don't pay much attention to this language (and am maybe starting to regret that) - some of these changes look really good. UTF-8 I/O by default! Safe string interpolation!