Couldn't this be a good case for writing a kernel in a language that had coroutines or continuations? One can write the code in the easier to follow procedural style, and then make it asynchronous just by adding "yield" or "call/cc" at the appropriate points. I wonder how hard it would be to come up with a version of C with coroutines, and whether that could be used with the Linux kernel...
Is there some way you could take the synchronous implementation of a driver or filesystem or other kernel component, determine the blocking points (or just require them to be annotated), and then automatically generate a state-machine-based asynchronous implementation from the synchronous one? A single source code for both synch and asynch implementations would make one more confident they were both correct.