JDK 21 released
JDK 21 released
Posted Sep 20, 2023 13:32 UTC (Wed) by MarcB (guest, #101804)In reply to: JDK 21 released by vasvir
Parent article: JDK 21 released
One thread per request (or connection) seems to be a much simpler model, but it can have too much overhead on a 1:1 implementation.
Posted Sep 20, 2023 14:29 UTC (Wed)
by paulj (subscriber, #341)
[Link] (13 responses)
There's issues of preemption, blocking, etc.
I wonder what kind of state a user-space lwt scheduler would /not/ need that the OS does, that gives wins here?
Intuitively, a problem-domain specific work-task:OS-thread multi-plexer seems like it could give scalability wins, but I'm less convinced about M:N lwt:OS threads - if that were such an easy win, why did Linux and Solaris go for 1:1 after much deliberation over and (in case of Solaris) experience with M:N models?
Posted Sep 20, 2023 16:07 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (3 responses)
The only bit of state that the kernel scheduler must deal with, but a user scheduler can potentially ignore, is the CPU context (registers etc).
However, back when threads first became a significant part of OSes, kernels had a lot of state associated with each thread. This made M:N models look very attractive, since the user scheduler was being built up with the bare minimum of state, not being cut down from the process scheduler (where the cost of switching MMU setups was high enough to hide the cost of handling excess state).
Posted Sep 21, 2023 10:19 UTC (Thu)
by paulj (subscriber, #341)
[Link] (2 responses)
A good chunk of the kernel sub-system and book-keeping could be saved by LWTs I guess. But... then you lose stuff like... NUMA balancing, separation between threads (e.g. resources via cgroups, security via LSM). However, you could argue: Who on earth would design a process that relied on different traditional threads within the process having different resource or security contexts?
I guess the argument could be made that the common case of traditional threads (i.e. shared memory, shared resources, shared everything bar a few contexts like CPU regs, stack) should be handled specially and made much lighter-weight in the kernel. I.e. a thread should be able to run with just the stuff from thread_struct - it shouldn't need a whole task_struct for each thread?
In the context of this article, Java and it's M:N lwt's, it is course not trying to present a full Linux or POSIX thread interface to user-space, so that's a much "easier" job.
Posted Sep 21, 2023 11:08 UTC (Thu)
by MarcB (guest, #101804)
[Link] (1 responses)
Exactly, the idea here explicitly is not to replace "real" threads. It is intended as an alternative to asynchronous APIs that would all be used from a single OS thread anyway.
The primary goal, according to https://openjdk.org/jeps/444, is to "Enable server applications written in the simple thread-per-request style to scale with near-optimal hardware utilization."
Posted Sep 21, 2023 13:14 UTC (Thu)
by paulj (subscriber, #341)
[Link]
The other interesting question here is to ask whether the kernel should have much lighter threads. Full tasks come with an _immense_ amount of baggage, for separation/security/etc. It'd be nice to have in-kernel light-weight threads, that just do the minimum needed for share-everything-bar-CPU-and-stack with POSIX thread APIs.
Posted Sep 20, 2023 21:12 UTC (Wed)
by dcoutts (guest, #5387)
[Link] (1 responses)
There are examples out there. Notably, GHC (the primary Haskell compiler) supports light-weight pre-emptive threads which are much cheaper than OS threads, both in memory and context switching time. It has a simple thread scheduler in the RTS. I'm not sure exactly what state it saves vs an OS thread mainly because I'm not sure exactly what state a Linux thread carries, but a GHC thread is just a small struct and a small initial stack (less than 4k).
This means one can get great performance from the simple classic server design of one thread per client. 10k or 100k such threads is perfectly reasonable.
The main difficulty with libc M:N schedulers is pre-emption: usually it being expensive or complex or both. There's quite a bit of academic research that looks at solutions involving compiler support for inserting cheaper pre-emption points. This is the approach that GHC takes too.
For example: https://dl.acm.org/doi/abs/10.1145/2400682.2400695
And also advocated by: https://www.usenix.org/legacy/events/hotos03/tech/full_pa...
Posted Sep 21, 2023 1:51 UTC (Thu)
by wahern (subscriber, #37304)
[Link]
Relatedly, O(1) event polling APIs like kqueue, epoll, and Solaris Ports[1] didn't exist back when M:N threading was being debated. Those APIs don't solve pre-emption, but without such APIs the ability to spawn hundreds or thousands of threads in a network server didn't make any sense--a scheduler relying on select/poll had little chance of "C10K" scaling.[2][3] Such O(1) event interfaces were a necessary, if crude, facility for user land scheduling to be meaningfully competitive with 1:1 threads for any real-world work loads. That fact that they composed well (i.e. they're built around a special file descriptor, rather than some global facility or state) also made it easier for user land developers, including language implementors, to experiment.
NetBSD, like Solaris before it, did end up with scheduler activations to support it's M:N threading model, but I assume it was too little, too late. And AFAIU its event reporting was based on signals (similar to SIGIO/SIGPOLL), which effectively meant it was an interface only NetBSD system developers could iterate on and make use of. (I'm also not sure what types of events it reported. Did it even resolve the network connection scaling problem? Was Solaris' /dev/poll interface an offshoot of its scheduler activation framework?)
[1] Solaris 7 had /dev/poll, but M:N threading predated it. /dev/poll wasn't a public API until Solaris 8, yet by Solaris 9 M:N threading was already deprecated.
[2] SIGIO/SIGPOLL was a thing, but POSIX signal semantics are simply too cumbersome. Interrupts vs polling for software interfaces was and remains an entire debate until itself, ongoing since the 1970s (1960s?).
[3] poll hinting, where the kernel optimistically installed persistent event watchers, also existed, at least experimentally (see https://web.archive.org/web/20020226200329/http://www.hum...), but it was more of a band-aid and marginal fix. I think some BSDs effectively ended up implementing this later as part of refactoring their VFS layer and poll/select around the internal kevent (kqueue) system.
Posted Sep 21, 2023 5:35 UTC (Thu)
by joib (subscriber, #8541)
[Link] (6 responses)
But it seems M:N threading is more successful in a runtime for higher level languages, where you can convert blocking I/O into epoll/io_uring/etc. behind the scenes, you can do things like split stacks or movable stacks, and maybe you can make do with co-operative multithreading in the user-level scheduler rather than supporting full preemption, etc.
Posted Sep 21, 2023 12:42 UTC (Thu)
by ms-tg (subscriber, #89231)
[Link] (5 responses)
I’m curious how this will go in Java - do they plan to instrument *every* existing IO API with inherently blocking semantics as you describe, so that transparently and with no code changes, the “blocking” IO can occur at runtime as a paused virtual thread which transparently wakes up when the IO is ready?
In the Ruby world there were many evolutions of this model, now there are IO hooks in a core Fiber Scheduler interface: https://docs.ruby-lang.org/en/3.2/Fiber/Scheduler.html
My understanding is that in Rust, Python, Typescript and Javascript, no such unification of blocking code semantics with non-blocking IO via lightweight virtual threads is currently offered - meaning pretty much every IO interface needs to be duplicated in a way that signals async readiness in order to power async/await or straight callback code patterns, is that accurate?
From the Ruby world it felt
Posted Sep 23, 2023 6:03 UTC (Sat)
by znix (subscriber, #159961)
[Link] (3 responses)
Yep, this is how it works. It's supposed to be plug-and-play, with no code changes required.
Java has had a pretty powerful IO API - New I/O or 'NIO' - that's supported both synchronous and asynchronous modes for ages, across both network and file IO.
My understanding is that most of the other IO APIs have been slowly rewritten in Java (eg by JEP 353) to internally use NIO, so that adding Loom support to NIO makes all the other APIs also support Loom.
This works because FFI is relatively rarely used in Java, so the problem of a native function in a library making a blocking syscall is a relatively small one - there's a few database APIs that do this, but AFAIK they're mostly being rewritten in pure Java.
Posted Sep 23, 2023 21:09 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
NIO has been present since 2002 or so, and it was almost completely useless. Pretty much nothing in the Java ecosystem supported it. It also turned out to be kinda useless for lightweight threads, so that the NIO core had to be rewritten to support them.
Also, NIO doesn't support async file operations on Linux (I haven't checked Solaris or Windows).
Posted Sep 23, 2023 23:28 UTC (Sat)
by znix (subscriber, #159961)
[Link] (1 responses)
Oh, interesting! I didn't realise that.
> Also, NIO doesn't support async file operations on Linux (I haven't checked Solaris or Windows).
I thought it did on Solaris, but I may very well be mistaken.
Posted Sep 25, 2023 3:38 UTC (Mon)
by ssmith32 (subscriber, #72404)
[Link]
So technically, your original statement is correct, NIO does support a powerful API and async I/O, but it didn't at first.. once Java 17 rolled around, they added a bunch of functionality to nio, which, lumped together, was referred to as NIO.2 at the time.
Nowadays, everyone just calls it NIO.
Posted Sep 23, 2023 20:58 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Basically. Java doesn't have a lot of IO interfaces, so it's not such a huge task. They way JDK handles this is similar to Go, each time a potentially blocking operation happens, the virtual thread is "pinned" to the system thread, so that the Java lightweight thread scheduler knows not to wait for it to become available.
For example, here's the code that handles file reading: https://github.com/openjdk/jdk/blob/master/src/java.base/... This basically means that any file IO will still require thread-per-operation.
Network IO is special, so it has hooks into the lightweight scheduler. For example, network blocking reads ultimately end up here: https://github.com/openjdk/jdk/blob/a2391a92cd09630cc3c46... The code transparently yields to the scheduler in case of lightweight threads, or just does a blocking wait if it's started from a real thread.
One bad thing is proliferation of special-casing. For example, JDK developers really wanted to support thread cancellation. But this means that threads might leak network connections, so they are automatically closed if this happens: https://github.com/openjdk/jdk/blob/a2391a92cd09630cc3c46... This already leads to some problems where the code actually wants to handle interrupts.
Posted Sep 25, 2023 2:47 UTC (Mon)
by ssmith32 (subscriber, #72404)
[Link]
"
In the asynchronous style, each stage of a request might execute on a different thread, and every thread runs stages belonging to different requests in an interleaved fashion. This has deep implications for understanding program behavior: Stack traces provide no usable context, debuggers cannot step through request-handling logic, and profilers cannot associate an operation's cost with its caller. Composing lambda expressions is manageable when using Java's stream API to process data in a short pipeline but problematic when all of the request-handling code in an application must be written in this way. This programming style is at odds with the Java Platform because the application's unit of concurrency — the asynchronous pipeline — is no longer the platform's unit of concurrency"
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released
> could argue: Who on earth would design a process that relied on different traditional threads within the process
> having different resource or security contexts?
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released
JDK 21 released