LWN: Comments on "Virtual Memory I: the problem"

Virtual Memory I: the problem

alpharomeo — Sun, 21 Mar 2004 01:02:40 +0000

Some questions/comments:

1) Why not cause the kernel to manage memory in de fact pages larger than 4K? A larger page is how other OS's manager large memory efficiently, whether we are talking 32 bit or 64 bit addressing. To avoid breakage, why not keep the current 4K page size but have the kernel always allocate/free pages in blocks of, say, 16 pages, or even 256 pages? Then the page table would be vastly smaller.

2) It may be convenient to say "use a 64 bit machine". The fact is that 64 bit addressing is inefficient and overkill for many situations. Experience with several architectures that support simultaneous 32-bit and 64-bit applications has shown that the 64-bit builds run at least 25-30% slower than the corresponding 32-bit builds. Bigger addresses means bigger programs, bigger stacks and heaps, etc.. Some applications may require nearly twice as much memory when run in 64-bit mode. So, why not optimize the kernel to provide better support for 32-bit addressing? In particular, what is so wrong with supporting infinite physical memory but limiting process address space to 4 GB (or 3 GB)?

3) Why is shared memory such a big problem? We have never been able to get Linux to allow shared memory segments larger than slightly less than 1 GB. Is there some trick to it? Linux reports that there is insufficient swap space, but it does not matter how many swap partitions you allocate - you always get the same error.

Thanks!

Virtual Memory I: the problem

mysticalreaper — Sat, 20 Mar 2004 01:35:22 +0000

As the other reply stated, SGI's altix runs on Intel's Itanium 2 processors, which are 64 bit, and thus, can address a silly amount of memory (2^64 bytes) and so do not suffer these silly problems.

Why only 1 G is directly accessible

Duncan — Fri, 12 Mar 2004 09:51:00 +0000

First, keep in mind that we are talking about a less than 4 gig address
space, the physical limit of the "flat" memory model, 32-bits of address,
with each address serving one byte of memory. One can of course play with
the byte-per-address model and make it, say, two bytes or a full 32-bit
4-bytes, but there again, we get into serious compatibility problems with
current software that assumes one-byte handling. The implications of that
would be HUGE, and NOBODY wants to tackle the task of ensuring
4-byte-per-address clean code, since the assumption has been
byte-per-address virtually forever and virtually ALL programs have that
axiom written so deep into their code you might as well start over again
(which is sort of what Intel argued should be the case with Itanic, clean
start approach, anyway, taking the opportunity to move cleanly to 64-bit,
which is why it never really took off, but that's an entirely different
topic). It's simply easier to move to 64-bit address space than to tinker
with the byte-per-address assumption. Thus, 32-bit is limited to 4-gig of
directly addressable memory in any practical case.

Another solution, as generally used back in the 16-bit era, is called
"segmented" memory. The address back then consisted of a 16-bit "near"
address, and a 16-bit "segment" address. The issue, as one would expect,
ammounted to one of performance. It was comparatively fast to access
anything within the same segment, much slower to address anything OUT of
the segment. As it happened, 64k was the segment size, and if you
remember anything from that era, it might be that editors, for instance,
quite commonly had a limit on the size of the editable file of somewhat
less than 64k, so they could access both their own operational memory AND
the datafile being edited, all within the same 64k segment. However, the
benefits of "flat" memory are such that few want to go back to a segmented
memory model, if at all possible to stay away from it. (That said, the
various high memory models do essentially that, but try to manage it at
the system level so at least individual applications don't have to worry
about it, as they did back in the 16-bit era.)

That still doesn't "address" (play on words intentional) the lower 1-gig
kernel-space, 3-gig user-space, "soft" limit. As you mention, yes, in
theory the kernel /can/ address the full 4-gig. The problem, however, is
hinted at elsewhere in the article where it talks about the 4G/4G patch --
use all available direct address space for the kernel, and switching
between usermode and kernelmode becomes even MORE tremendously expensive
than it already is, in performance terms, because if they use the same
address space, the entire 4-gig "picture" has to be flushed (more on that
below), so the new "picture" of the other mode can be substituted without
losing data. As explained in the article each mode then has to manage its
own memory picture, and the performance issues of flushing that picture so
another one can replace it at each context switch are enormous.

As already mentioned in other replies, there are a number of solutions,
each with their own advantages and disadvantages. One is the 2G/2G split,
which BTW is what MSWormOS uses. This symmetric approach allows both the
kernel and userspace to access the same four gig maximum "picture", each
from their own context, but sharing the picture, so the performance issues
in flushing it don't come into play. It does give the kernel more
comfortable room to work in, but at the expense of that extra gig for
userspace. While few applications need more than their two-gig share of
memory to work in, the very types of applications that do, huge database
applications and other such things, happen to be run on the same sorts of
systems that need that extra room for the kernel.. huge enterprise systems
with well over eight gig of physical memory. Thus, the 2G/2G solution is
a niche solution that will fit only a very limited subset of those running
into the problem in the first place. The 4G/4G solution is more practical
-- EXCEPT that it carries those huge performance issues. Well, there's
also the fact that even a 4G/4G solution only doubles the space available
to work with, and thus is only a temporary solution at best, perhaps two
years worth, maybe 3-4 by implimenting other "tricks" with their own
problems, even if the base performance issue didn't apply. That's where
the next article comes in.

The loose end left to deal with is that flushing, mentioned above. I must
admit to not fully understanding this myself, but a very simplistic view
of things would be to imagine a system with 8 gig of physical memory,
dealt with using the previously mentioned "segments", of which there would
be two, one each for userspace and kernel space. A mode switch would then
simply mean changing the segment reference, ensuring all cache memory is
flushed out to the appropriate segment before one does so, of course.

Practice of course doesn't match that concept perfectly very often at all,
however, and even if a system DID happen to have exactly eight gig of
memory, such a simplistic model wouldn't work in real life because of
/another/ caveat.. that being that each application has its own virtual
address space map, and few actually use the entire thing, so one would be
writing to swap (a 100 to 1000 times slower solution than actual memory,
so a generally poor solution if not absolutely necessary) entirely
unnecessarily with only one application being runnable at once.

That of course is where vm=virtual memory comes in. as it allows all the
space unused by one app or the kernel itself to be used by another, with
its own remapping solutions. However, that's the part I don't really
understand, so won't attempt to explain it. Besides, this post is long
enough already. <g> Just understand that flushing is a necessary process
of low enough performance that it should be avoided if possible, and that
the concept is one of clearing the slate so it can be used for the new
memory picture, while retaining the data of the first one so it can be
used again.

Duncan

Virtual Memory I: the problem

jmshh — Thu, 11 Mar 2004 22:17:04 +0000

The keyword here is "directly", i.e. without any manipulation of page
tables. So all physical RAM has to live inside the 1GB virtual address
space of the kernel, together with some other stuff, like video buffers.

Virtual Memory I: the problem

mmarkov — Thu, 11 Mar 2004 18:36:47 +0000

If the kernel wishes to be able to access the system's physical memory directly, however, it must set up page tables which map that memory into the kernel's part of the address space. With the default 3GB/1GB mapping, the amount of physical memory which can be addressed in this way is somewhat less than 1GB - part of the kernel's space must be set aside for the kernel itself, for memory allocated with vmalloc(), and various other purposes.

Honestly, I don't understand here why only 1GB is accessible under these premises.

PS Great article, Jon. In fact, great articles, both part I and part II.

Altix

corbet — Thu, 11 Mar 2004 17:30:32 +0000

"I thought SGI's Altix was able to handle a huge amount of memory. Does anyone know if SGI's kernel also uses the 4G/4G patch?"

I do believe that Altix systems are Itanium-based, so they don't have to deal with all this obnoxious stuff.

Virtual Memory I: the problem

parimi — Thu, 11 Mar 2004 17:23:15 +0000

Jon, Thanks for such an informative article!

I thought SGI's Altix was able to handle a huge amount of memory. Does anyone know if SGI's kernel also uses the 4G/4G patch?

Virtual Memory I: the problem

nix — Thu, 11 Mar 2004 11:51:51 +0000

If that were true any given process would only be able to address a third of the physical RAM in the system (on a fully-populated non-highmem box).

This is considered silly, since process virtual memory requirements can be far higher than their physical requirements. :)

Virtual Memory I: the problem

axboe — Thu, 11 Mar 2004 10:34:54 +0000

Nah, it is you who gets it backward.

Virtual Memory I: the problem

dale77 — Thu, 11 Mar 2004 09:06:40 +0000

Yep, buy yourself an AMD64.

Perhaps one of these:

http://www.gamepc.com/shop/systemfamily.asp?family=gpdev

Dale

Virtual Memory I: the problem

dlang — Thu, 11 Mar 2004 08:20:04 +0000

I think you have the 3:1 split backwards. as I understand it the kernel gets the 3G portion and userspace gets 1G.

there are patches to allow you to change it to 2G:2G and I believe I've seen either a 3:1 or a 2.5:1.5 (I don't remember which at the moment) but as you cut down the amount of address space available to the kernel other problems become more common.

Virtual Memory I: the problem

oconnorcjo — Thu, 11 Mar 2004 05:27:24 +0000

I agree with Linus on this one. Opterons and AMD64 are out now and Intel will have an x86-64 in the next year or so. If you really need the RAM then one might as well get a 64 bit chip to go with it. Racking up the RAM on a 32 bit system is like stuffing 10 pounds of potatoes in a five pound bag.