Posted Jun 10, 2011 8:13 UTC (Fri) by Ankita (subscriber, #39147)
Parent article: Memory power management
Hi John,
You are right that creating regions inside zones results in a bloat in the number of zones. But the difficulty we faced is that zones already encapsulate some boundary information, that might be at a level lower than regions. A region could span multiple zones, in which case we would need another mechanism to group these sub-regions into one region that maps to an independently managed power unit. For instance, a single numa node with 8GB RAM, will come up with two zones- ZONE_DMA and ZONE_NORMAL. But if this numa node has support for power management at a different level, say 2GB, then we would create 4 regions, thus spanning the two zones. This would make targeted allocation and reclaim now depend on another piece of information to unite regions that form one single unit. Further, the zone policies like movable allocations, could still be leveraged when zones are under regions. However, as Dave pointed out, it is important to understand the performance impact of this change.
Also, besides PASR, there are other mechanisms by which memory power could be conserved. The Samsung Exynos 4210 for instance, has support for automatic power down of memory, i.e, if there are no references to certain areas of memory for a certain threshold of time, the hardware would automatically put that area of memory into a lower power state, without losing the content. A basic infrastructure to make the VM aware of the hardware topology would aid the hardware in placing memory into lower power states.
Posted Jun 10, 2011 14:24 UTC (Fri) by ccurtis (guest, #49713)
[Link]
Is it possible to compare memory power savings by comparing usage with the full amount of RAM with usage with a restricted amount of RAM via a kernel boot parameter? The unused RAM would probably still need to be powered off manually but this seems like a simpler patch to start with. (For testing this I'm assuming an idle base load system that fits completely within the available RAM with all swap disabled, of course.)
Memory power management power savings test
Posted Jun 10, 2011 14:30 UTC (Fri) by mjg59 (subscriber, #23239)
[Link]
We don't have any way to power down memory on existing x86 hardware, as far as I know.
Memory power management power savings test
Posted Jun 10, 2011 15:05 UTC (Fri) by ccurtis (guest, #49713)
[Link]
Perhaps I was a bit too terse.
The amount of power actually saved is a bit unclear; estimates seem to run in the range of 5-15% of the total power used by the memory subsystem.
[...]
A recent patch set from Ankita Garg does not attempt to solve the whole problem; instead, it creates an initial infrastructure which can be used for future power management decisions.
Before creating this extensive infrastructure, perhaps it would be better to get an idea of what kind of power savings this actually provides. Code would still need to be written to power down the memory bank, and code would also likely need to be written to isolate the excluded RAM from the boot parameter, but this seems like a relatively easy way to answer the question before embarking on the endeavor.
Of course, this may be a done deal and it's just a matter of time before the code gets written, but it would still be interesting to see how much power is actually going to be saved. A patch like this would allow individuals to measure the power savings of their own systems as well, in case they wanted to control the trade of any overhead the new memory management changes might impose.
Memory power management power savings test
Posted Jun 10, 2011 16:20 UTC (Fri) by etienne (subscriber, #25256)
[Link]
Maybe just reboot your PC with different number of memory modules and measure the power difference?
Memory power management power savings test
Posted Jun 11, 2011 3:19 UTC (Sat) by willy (subscriber, #9762)
[Link]
The problem is that pages are interleaved across DIMMs. If you have 3 channels with a single DIMM each, cacheline 0 is on DIMM 0, cacheline 1 on DIMM 1, cacheline 2 on DIMM 2 and cacheline 3 on DIMM 0. Removing a DIMM causes the interleaving to change, which will also cause performance to change, and your measurements are now invalid.
As I understand PASR, one would not power down an entire DIMM, but rather sections of each DIMM, thus preserving the performance benefits of interleaving.
Memory power management power savings test
Posted Jun 16, 2011 4:54 UTC (Thu) by Ankita (subscriber, #39147)
[Link]
Yes, PASR support seems to be at bank level. From a few documents I have read, I find that interleaving can be configured to control the number of banks that have to be kept open for every memory access, typically the minimum being 2 banks. Thus, if only two banks are interleaved, the other banks can potentially be turned off or not refreshed. Power v performance benchmarking will be needed to decide on the best interleaving scheme though.
Memory power management power savings test
Posted Jun 13, 2011 14:11 UTC (Mon) by nye (guest, #51576)
[Link]
So, back in 2003 I built my first system with 1GB of RAM. Initially I thought I would save money by re-using a number of existing components, including the power supply.
Sadly, the existing power supply proved inadequate and the system would crash shortly after booting, if it even booted at all. The memory was a single DIMM so there wasn't anything I could remove to reduce power consumption, however it turned out that telling the kernel to use only about 800M allowed the system to remain completely stable for the few days it took to get a new power supply.
Thus I conclude that the most likely explanation is that RAM which the OS doesn't believe even exists actually does use less power than RAM which is simply not in use at the time.
Memory power management power savings test
Posted Jun 16, 2011 18:26 UTC (Thu) by Pc5Y9sbv (guest, #41328)
[Link]
You may have just been lucky that the most voltage-sensitive circuits were active in the "upper" region you excluded. It's not likely that the memory was statically powered down because it wasn't in use, but there are many dynamic power loads in digital circuits as they change states.
It could even have to do with certain combinations of address and data bits that required more power to configure the addressing logic and route the data signals, and it didn't stabilize within the configured access timings.
The end result is that certain memory addresses in a given module will tend to show corruption before others as the power supply sags or the timings get too tight. This is why people advocate long runs with a dedicated memory test program to try to validate parts in situ. Just running an OS may not exercise combinations of address and data bits with sufficient testing coverage, at least not for many hours (or weeks!) of operation.