Also, it occurs to me that even this strategy might be a bad one, since you still have the
problem of reads to unrelated variables causing some amount of cache line traffic for
writeable lines. Furthermore, the "best" strategy may vary by processor.
How many "write often" variables are there that are not per-CPU? If this set is moderately
small, perhaps forcing them all into separate cache lines with *nothing* in the holes is an
acceptable increase in footprint in modern CPUs. And, on AMD CPUs, you may be able to skip
this altogether: The MOESI cache protocol they use allows cache lines to be in a "shared,
writeable" state (the "O" state), which directly addresses this bounce issue.