> we put nearly everything that touched the hardware in user space, other than (as I suggested for graphics drivers) small kernel-space drivers to reflect interrupts to user space
Then you are actually agreeing with the existing KMS design since that is pretty much how it is implemented. The small amount of hardware management and IO needed to get data to/from the GPU is implemented as a kernel driver and the bulk of the complexity is in a userspace library (libEGL/libGL). The area where there may be confusion is that the minimum viable complexity of the kernel driver may be quite a bit higher than hardware you were interfacing with in your example, the GPU is practically a whole other machine with its own CPU, RAM, I/O but also sharing with the host machine. There's no way to manage that without some level of cooperation between the GPU and the CPU kernel.