L4 is not especially fast. It's just that it's mostly used on a hardware slow enough to make constant context switches tolerable. Some of the ARM chips even have hardware support for fast context switching!
But in general, yes, it's slower than a monolithic kernel