>> Or maybe get a pointer to a per-cpu structure at the beginning of your treatment and continue to use this pointer anyway, even if the processor has been changed under your foot.
This won't work if you are going to modify the per-cpu structure after the CPU has been changed. So you'll end up with either locking or atomic modifications for the structure. But since most of the time the structure is modified via native CPU, both locking and atomic modifications should be fast, because there is no cache line bouncing.