And since you refer to "none of the options you are suggesting makes a slight difference", can you also clarify whether you mean you have run a test case demonstrating false sharing even in the case of 64-bit scalars, and/or even in the case of assembly instructions which should have forced synchronization?