Posted Feb 2, 2006 15:40 UTC (Thu) by man_ls
Parent article: The search for fast, scalable counters
Speaking as a complete outsider...
Per-CPU counters become increasingly inaccurate as the number of processors grows, however. Each processor has a certain residual count which has not yet been folded into the global count. In situations where counters tend to increase, the result will be a global count which underestimates the real value, and which is increasingly wrong on larger systems.
To avoid underestimation, a better approximation can be obtained by considering the mean values for every counter. If N is the number of processors and M is the max value for every counter, adding N * M / 2 to the current measure would yield the expected value. Max error bound would be N * M / 2, while standard deviation (i.e. expected error) should be around sqrt(N) * M / 2. (In fact, if I'm not mistaken the distribution followed would be Student's t-distribution
, but a normal
is a good approximation.)
This means that the expected error should only grow with the square root of the number of processors: with 64 processors, the error is only 8 times bigger than on a single processor system. Choosing a small enough max value M would make the measure even more precise.
And while we are at it, I understand that a semaphore is needed to increment each per-CPU value, but to read it? Why not read on the fly all per-CPU counters and add them to the global counter when a count is required?
to post comments)