Your example has convinced me that I was wrong to say that you could get away with only two buddies. If you have only two, the best you can do is have a ring of cpus. This is different from the case of cuckoo hashing and skew caches, where only two places are sufficient.
However, in your example, A and C have 3 buddies, but B and D have only two. It seems like the scheduling algorithm would be just as efficient if they all had three, and there would be a better chance that migrations would be beneficial.