Fibrils and asynchronous system calls

[Posted January 31, 2007 by corbet]

The kernel's support for asynchronous I/O is incomplete, and it always has been. While certain types of operations (direct filesystem I/O, for example) work well in an asynchronous mode, many others do not. Often implementing asynchronous operation is hard, and nobody has ever gotten around to making it work. In other cases, patches have been around for some time, but they have not made it into the mainline; AIO patches can be fairly intrusive and hard to merge. Regardless of the reason, things tend to move very slowly in the AIO area.

Zach Brown has decided to stir things up by asking a basic question: could it be that the way the kernel implements AIO is all wrong? The current approach adds a fair amount of complexity, requiring explicit AIO handling in every subsystem which supports it. IOCB structures have to be passed around, and kernel code must always check whether it is supposed to block on a given operation or return one of two "it's in the works" codes. It would be much nicer if most kernel operations could simply be invoked asynchronously without having to clutter them up with explicit support.

To that end, Zach has posted a preliminary patch set which simplifies asynchronous I/O support considerably, but doesn't stop there: it also makes any system call invokable in an asynchronous mode. The key is a new type of in-kernel lightweight thread known as a "fibril."

A fibril is an execution thread which only runs in kernel space. A process can have any number of fibrils active, but only one of them can actually execute in the processor(s) at any given time. Fibrils have their own stack, but otherwise they share all of the resources of their parent process. They are kept in a linked list attached to the task structure.

When a process makes an asynchronous system call, the kernel creates a new fibril and executes the call in that context. If the system call completes immediately, the fibril is destroyed and the result goes back to the calling process in the usual way. Should the fibril block, however, it gets queued and control returns to the submitting code, which can then return the "it's in progress" status code. The "main" process can then run in user space, submit more asynchronous operations, or do just about anything else.

Sooner or later, the operation upon which the fibril blocked will complete. The wait queue entry structure has been extended to include information on which fibril was blocked; the wakeup code will find that fibril and make it runnable by adding it to a special "run queue" linked list in the parent task structure. The kernel will then schedule the fibril for execution, perhaps displacing the "main" process. That fibril might make some progress and block again, or it may complete its work. In the latter case, the final exit code is saved and the fibril is destroyed.

By moving asynchronous operations into a separate thread, Zach's patch simplifies their implementation considerably - with few exceptions, kernel code need not be changed at all to support asynchronous calls. The creation of fibrils is intended to make it all happen quickly - fibrils are intended to be less costly than kernel threads or ordinary processes. Their one-at-a-time semantics help to minimize the concurrency issues which might otherwise come up.

The user-space interface starts with a structure like this:

    struct asys_input {
	int 		syscall_nr;
	unsigned long	cookie;
	unsigned long	nr_args;
	unsigned long	*args;
    };

The application is expected to put the desired system call number in syscall_nr; the arguments to that system call are described by args and nr_args. The cookie value will be given back to the process when the operation completes. User space can create an array of these structures and pass them to:

    long asys_submit(struct asys_input *requests, unsigned long nr_requests);

The kernel will then start each of the requests in a fibril and return to user space. When the process develops an interest in the outcome of its requests, it uses this interface:

    struct asys_completion {
	long 		return_code;
	unsigned long	cookie;
    };

    long asys_await_completion(struct asys_completion *comp);

A call to asys_await_completion() will block until at least one asynchronous operation has completed, then return the result in the structure pointed to by comp. The cookie value given at submission time is returned as well.

Your editor notes that the current asys_await_completion() implementation does not check to see if any asynchronous operations are outstanding; if none are, the call is liable to wait for a long time. There are a number of other issues with the patch set, all acknowledged by their author. For example, little thought has been given to how fibrils should respond to signals. Zach's purpose was not to present a completed work; instead, he wants to get the idea out there and see what people think of it.

Linus likes the idea:

Yee-haa! [...]

I heartily approve, although I only gave the actual patches a very cursory glance. I think the approach is the proper one, but the devil is in the details. It might be that the stack allocation overhead or some other subtle fundamental problem ends up making this impractical in the end, but I would _really_ like for this to basically go in.

There are a lot of details - Linus noted that there is no limit on how many fibrils a process can create, for example - but this seems to be the way that he would like to see AIO implemented. He suggests that fibrils might be useful in the kevent code as well.

On the other hand, Ingo Molnar is opposed to the fibril approach; his argument is long but worth reading. In Ingo's view, there are only two solutions to any operating system problem which are of interest: (1) the one which is easiest to program with, and (2) the one that performs the best. In the I/O space, he claims, the easiest approach is synchronous I/O calls and user-space processes. The fastest approach will be "a pure, minimal state machine" optimized for the specific task; his Tux web server is given as an example.

According to Ingo, the fibril approach serves neither goal:

Now where do all these LWP, fibre, firbril, micro-thread or N:M concepts fit? Most of the time they are just a /weakening/ of the #1 concept. And that's why they will lose out, because #1 is all about programmability and they don't offer anything new: because they cannot. Either you go for programmability or you go for performance. There is /no/ middle ground for us in the kernel!

Ingo makes the claim that Linux is sufficiently fast at switching between ordinary processes that the advantages offered by fibrils are minimal at best, and not worth their cost. Anybody wanting performance will still have to face the full kernel AIO state machine. So, he says, there is no real advantage to fibrils at this time that are worth the cost of complicating the scheduler and moving away from the 1:1 thread model.

These patches are in an early stage, and this story will clearly take some time to play out. Even if a consensus develops in favor of the fibril idea, the process of turning them into a proper, robust kernel feature could make them too expensive to be worthwhile. But it's an interesting idea which brings a much-needed fresh look at how the kernel does AIO; it's hard to complain too much about that.

Index entries for this article
Kernel	Asynchronous I/O
Kernel	Fibrils

Fibrils and asynchronous system calls

Posted Feb 1, 2007 11:48 UTC (Thu) by simlo (guest, #10866) [Link] (1 responses)

It reminds me of call/cc in functional programming, which can be used to emulate volentary preemption. Using such a mechanism is much easier than making explicit state machines.

Having several fibrils per thread could be used event handling as well:
Interfaces often contain callbacks on which the user of a subsystem can get function executed on a special event. In multi-threaded environments this is hazardeous due to deadlocks. However, if the callback can be executed in a fibril in the user's thread instead of originating thread, you can avoid deadlocks and maybe even the need for a lock in the first place. (A fibril should not be allowed to run if the thread owns a mutex. Does the sample patch take care of that?).
That would in fact be a message passing system, which is really, really usefull for many applications. I am not a huge fan of message parsing systems, but I think the technique has it's places.

Fibrils and asynchronous system calls

Posted Feb 2, 2007 2:27 UTC (Fri) by pimlott (guest, #1535) [Link]

It reminds me of call/cc in functional programming

Bingo, I was hoping someone would say that. The functional world has some nice mechanisms for handling concurrency. For example you can express an IO operation as a description of the request, plus a function to run on the result (and this is natural with lambda and lexical closure). The runtime can manage all outstanding requests and pass the responses to the functions as they become available. Similarly, a driver can be expressed as a computation that returns either an answer, or a message indicating what resource it needs before it can continue, plus a function to run on that resource. The runtime again obtains the needed resources and runs the continuations. You get most of the benefit of event-driven programming, with the illusion of sequential processing (outside the runtime).

I'm reminded of Alan Cox's quote, "Computers are state machines. Threads are for people who can't program state machines." To which the best programmer I've ever known answered, "I can't program state machines!" The appearance of sequential execution is a boon to programmer productivity.

Fibrils and asynchronous system calls

Posted Feb 1, 2007 14:53 UTC (Thu) by kirkengaard (guest, #15022) [Link]

So a process which causes all of these mini-threadlets to exit simultaneously would be a de-fibril-ator? ;)

Fibrils and asynchronous system calls

Posted Feb 1, 2007 16:47 UTC (Thu) by liljencrantz (guest, #28458) [Link]

This looks a lot like various cooperative threading libraries I've seen. You basically set up a bunch of different stacks and pretend that each one is a new thread. Context switching is just a longjmp - it's significantly faster than a real context switch and significantly easier to code for as well, since you don't have any problems with sharing data between threads. Multithreading is much easier when you get to say when and where context switches can take place.

I don't agree with Ingo Molnars assertion that one should aim to either make something as simple or as fast as possible and never anything in between, I've found cooperative threading to sometimes offer a very nice tradeoff between the two.

Fibrils and asynchronous system calls

Posted Feb 1, 2007 21:27 UTC (Thu) by schabi (guest, #14079) [Link]

I fully agree with Ingo here. The real way to do this is to work completely asynchroneous inside the kernel, and provide both an async and a sync interface to it.

Fibrils and asynchronous system calls

Posted Feb 1, 2007 22:56 UTC (Thu) by Nir (guest, #27440) [Link]

One of my systems generates approximately 1000 aios a second. What will be the cost in "IO wait" ? Will the CPU suffer more ? Will it better or worst than the current implementation ? Raz

Windows NT

Posted Feb 3, 2007 18:09 UTC (Sat) by mikov (guest, #33179) [Link] (1 responses)

In the Windows driver model all calls are asynchronous. Every user mode request is represented by an IRP structure which can be linked in various lists, etc, and completed asynchrounsly.

I have always liked that approach - it can be complicated to grasp and implement, but it is consistent and after all is the natural way for handling IO operations at the low level:

User mode makes a request to the driver
An operation is submitted to the hardware
The hardware starts executing it
At some later time the hardware completes and generates a completion interrupt
The user mode request is completed asynchronously

Fibrils seem like an awful patch to me.

Windows NT

Posted Feb 15, 2007 16:02 UTC (Thu) by IXRO (guest, #39871) [Link]

Unfortunately that was slow enough, so they introduced FastIO, that bypasses the stack

The duality of Threads and Events

Posted Feb 12, 2007 5:57 UTC (Mon) by HalfMoon (guest, #3211) [Link]

I basically agree with Ingo here. Been there, done that.

The underlying observation is not new; threads and events/state-machines are the two basic ways to express concurrency in an OS context. Why do so many folk have a hard time understanding IRQ handling? Because that's purely an event mechanism. Why do so many people (think they) like threads? Because they've been taught that the world is really single threaded, in all those intro programming courses, and most text books don't burst that bubble. Hardware designers work with state machines all the time, of course...

The classic problem of evolving an operating system so it scales up without losing performance can also be stated as: how do we move this pile of legacy crap more towards a primarily event-driven model?

Of course it's not like threads will ever vanish. They work hand in hand with event frameworks, the other side of the coin ... and no sane person will say "heads" is always better than "tails". But on the other hand, a large system with 10,000 threads crunching away inside of it is going to be more wasteful than the equivalent one with all those threads waiting at the periphery, while the insides are an event-driven state machine chewing away at the requests issued by those threads.

Remember: ten thousand thread/fibril/LWP/... stacks takes up a lot of memory (4K each), and the thing that makes them seem to be "easy to program" is in large part that critical system state is isolated on all those stacks. So when something goes wrong, it's almost completely invisible to code tasked with recovering. But if that state were made explicit as part of the state machine, it would be available.

Concurrent programming isn't easy. And threads are not a silver bullet; there is no such thing.