Extending restartable sequences with virtual CPU IDs

Posted Mar 2, 2022 2:20 UTC (Wed) by calumapplepie (guest, #143655)
In reply to: Extending restartable sequences with virtual CPU IDs by compudj
Parent article: Extending restartable sequences with virtual CPU IDs

> The memory allocators (glibc, tcmalloc) will be some of the first heavy users, so I expect we will end up in a situation where having the VCPU_DOMAIN feature enabled will actually bring performance benefits to the overall system (including user-space) due to improved memory allocation/access locality.

It'd be very important how they use VCPU_DOMAIN. If they're indexing into an array of pointers, it wouldn't make a difference for cache locality on a few-core system: 8 cores * 8 bytes/pointer is 64 bytes, conveniently close to where I drew my arbitrary line. Of course, that's assuming they align these arrays to cache lines, but I think that's a perfectly reasonable optimization to expect.

Of course, if they're using them in any other way, than the cache optimizations are up for grabs.

> I would like to see numbers here

Sadly, like any annoying internet commentator, I am not tooled up to provide any numbers to back up my wild assertions.

> I expect that having the memory allocator pools spread over fewer vcpu ids (compared to number of possible cpus) will provide significant gains on workloads that have few concurrently running threads that migrate between cores.

That is an excellent argument that I did not think into: how migration between cores would interact with the cache locality. I think there are some problems with assuming that locality would naturally increase using VCPU versus a standard ID, however.

Firstly, I believe (though I could be very wrong) that the scheduler will try to preferentially place a process on the same core repeatedly, so it can benefit from L1 caches. If that is the case, VCPU makes fairly little difference in practice: the indexed structure will be repeatedly accessed at the same location, holding the same cache lines hot.

Also, because the differences only matter when tasks migrate cores, by necessity the L1 and (possibly) the L2 caches won't matter. The difference made by caches is thus limited sharply.

> You are also presuming that virtual CPU IDs bring a 2% performance degradation. Is that number made up or did you benchmark the patch series discussed in the article ?

Completely and utterly made up. These are the idle theories of a procrastinating college students.