As I said, I was not trying to compare Win32 and POSIX, nor their specific implementations - only the relative cleanliness of the (specific subset of the) APIs.
Also, I explicitly restricted my qualification to core "IO operations and thread management". Most of problems you raised relate to Winsock or User32 or COM, and so are completely beside the point.
I agree that ideally WaitForMultipleObjects() should have a much larger limit, but that doesn't negate the fact that it is conceptually a very clean and powerful API. Read/WriteFile[Ex], GetOverlappedResult, CancelIO, etc (even DeviceIoControl) - it is all very very simple and orthogonal.
The thread management and the IO use the same APIs. You can wait on anything (including custom IOCTLs!) using the same call, you also have a useful set of atomic operations (InterlockedXXX()).
I think that there is no need to compare this in detail to POSIX.
POSIX is what it is. It wasn't an effort to define a new clean API - it simply ratified the existing state of affairs, which had grown organically. It was a success because it exists and is portable and there are implementations from multiple vendors.
But we can't lie to ourselves by pretending that it is elegant or orthogonal or anything but a horrible mess.