It depends entirely on what you are programming. If you are building an HPC application, you know what you will be accessing.
However, if you write a JVM, you have no idea what the application running inside the JVM will be accessing. It is entirely possible that the application will generate data (once) with one thread, and then access it hundreds of times with another thread.
For one situation, it looks obvious that Peter's solution has less overhead. For the other situation, it is not clear at all what the way to go would be. Maybe Andrea's code will automatically figure it out...
NUMA scheduling is a hard problem. Not because the solutions are difficult, but because nobody even knows exactly what all the problems look like.