Yeah, it seems like if the problem is there are "too many" cores per socket, that instead of fixing on 2, it should be fixed on "almost but not quite too many" -- e.g., 4 for example? So if you have 4-cores per socket, it looks exactly like it does before; but if you have 8, you still only look at 4 other cores. Having the sets overlap seems like an obvious way to make sure things can find a good optimum within a few iterations.