Time spent in the kernel scheduler doesn't count, because it happens so rarely. What counts is time spent in user code that isn't pre-empted; incrementing and decrementing a memory word (never mind fooling with a timer!) are expensive. By contrast, code in the kernel to compare a couple of memory words against a pattern is simple, and updating the process state if they match is also simple. So, you just need an instruction sequence that does the right thing very cheaply when it doesn't get pre-empted, and is easy to recognize and patch when it does.