This uses the same mechanism (pthreads locked by a single global semaphore so only one is runnable at a time) we use in Rockbox to emulate non-preemptive environment on normal PCs (the simulators using SDL).
We actually got rid of this mechanism because is much more performance demanding than real cooperative usermode threads implemented with set/longjmp (pth works like that). Context switch overhead is huge compared to plain longjmp().
One advantage of the pthread-approach is you can temporarily enable preemption during blocking I/O calls so that not the whole program is blocked. Oh, and valgrind/gdb debugability (we actually keep the emulation around for this purpose). But that's about it. For the most part, real cooperative threads perform better. That's our experience.