User: Password:
|
|
Subscribe / Log in / New account

Mean value

Mean value

Posted Feb 2, 2006 17:37 UTC (Thu) by madscientist (subscriber, #16861)
In reply to: Mean value by man_ls
Parent article: The search for fast, scalable counters

> And while we are at it, I understand that a semaphore is needed to
> increment each per-CPU value, but to read it? Why not read on the fly all
> per-CPU counters and add them to the global counter when a count is
> required?

Exactly what I was thinking! After all a counter is _inherently_ inaccurate: by the time the kernel reports the value back to whomever was asking there's an excellent chance it's changed. So why would it matter if we don't lock the counter before we read it? It seems like there are a number of options that could be considered, although certainly there are complexities as well.


(Log in to post comments)

Mean value

Posted Feb 3, 2006 3:28 UTC (Fri) by obobo (guest, #684) [Link]

In general, you can run into problems where the counter is being incremented as it is being read. Say you're using a (mythical) 8-bit architecture, and the 16-bit counter value is being incremented from 0x00ff to 0x0100. The intermediate state, as the counter is being updated, will be 0x0000 (or perhaps 0x01ff) because only 8 bits are being written at a time. Much worse than the off-by-one situation that I think you were imagining. And worse still if it is a 64 bit counter on a 32 bit system...

Caveat- I don't know anything the specific types mentioned in the article (local_t in particular) so this particular problem might not be an issue there depending on how the atomicity of it works.

Mean value

Posted Feb 3, 2006 10:39 UTC (Fri) by man_ls (guest, #15091) [Link]

So, just stick to the size native to the architecture. On 32-bit machines, never use more than 32 bits for the intermediate values; if you happen to be on a Z80 (not so mythical 8-bit architecture) use a max value of 255 for them.

Incrementing a 32-bit register on a 32-bit architecture is an atomic operation even on multi-processor systems, right?

per-CPU counters and locking

Posted Feb 5, 2006 0:35 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

Who said anything about semaphores with per-CPU counters? The article specifically said there is no locking required for updating a per-CPU counter. It says you never read a per-CPU counter.

Also, there is no locking required to read the global counter.

Now as for the orthogonal issue of getting a complete count instead of just reading the global counter: how are you going to read the per-CPU counter of another CPU? The only way is for that CPU to write it out to main memory and for you to go to main memory to get it. That takes a long time, and that's the slowness that per-CPU counters are supposed to eliminate.

per-CPU counters and locking

Posted Feb 5, 2006 2:11 UTC (Sun) by man_ls (guest, #15091) [Link]

how are you going to read the per-CPU counter of another CPU?
Never thought of that; just another example of why kernel programming sometimes looks so hard to outsiders.
The only way is for that CPU to write it out to main memory and for you to go to main memory to get it. That takes a long time, and that's the slowness that per-CPU counters are supposed to eliminate.
At least it would spare the "expensive locking operations" mentioned in the main article, wouldn't it?

per-CPU counters and locking

Posted Feb 5, 2006 3:13 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

just another example of why kernel programming sometimes looks so hard to outsiders.

This is a relatively recent addition to the things performance programmers have to worry about. Used to be, electronic memory was considered fast, and it was always the same speed. Typically, it took at most 4 cycles to access it. Now, there are 4 levels of electronic memory (including the registers) and the main pool is 200 cycles away. We think of looking at memory now the way we used to think of loading a file from a disk.

I'm not ashamed to say my personal leaning these days is toward single-CPU single-core machines tied together at the network adapter. My brain can handle only so much complexity.

At least it would spare the "expensive locking operations" mentioned in the main article, wouldn't it?

That locking operation is for updating a counter. For accessing one, the only cost is moving the cache line from one CPU to another -- through main memory. Whoops, I just remembered - on fancy modern machines, there's a shortcut to move a cache line without going out on the memory bus. So it's cheaper than what I said earlier, but still more than folks are willing to pay.

per-CPU counters and locking

Posted Feb 9, 2006 4:57 UTC (Thu) by roelofs (guest, #2599) [Link]

Now, there are 4 levels of electronic memory (including the registers)

5 levels sometimes (at least in principle). There exist Intel chips with L3 caches (Xeon Gallatin, I think), and presumably each cache type is a different speed from the others and from main memory.

Greg


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds