LWN: Comments on "Application-friendly kernel interfaces"

liblinux

slamb — Fri, 06 Apr 2007 01:06:56 +0000

I'm not sure about "would always be in sync". If you require that, people who have multiple kernels on their box would need some mechanism such that the correct liblinux for whatever kernel they happened to boot is dynamically loaded. Seems possible, but it's a step beyond "maintained from the same source".

This is one of those areas where the BSDs have an easier time. They do "make world", and it's just inconceivable that an actual end user would mix'n'match kernel and userspace from different versions of FreeBSD. They got away with things like top assuming layout of kernel structures and accessing /dev/kmem for a long time. On Linux, that sort of mutt system is considered normal, so stuff has to be carefully versioned.

Why add anything?

joib — Thu, 05 Apr 2007 18:57:44 +0000

One problem is that the number of large page TLB entries is quite limited. E.g. on current Opterons, while you have a 512-entry data TLB for the normal 4K pages, for the 2M large pages you only have 8 entries. So if you have a loop kernel reading/writing from more than 8 big arrays you're going to have TLB trashing.

I would presume that for non-HPC applications these non-streaming, irregular access patterns are even more common. Though supposedly AMD is fixing this issue with the upcoming 'Barcelona' by having 128 2M TLB entries, and additionally supporting 1G pages (don't know how many TLB entries for those).

For comparison, the Intel Woodcrest has 256 4K and 32(?) 2M TLB entries.

Application-friendly kernel interfaces

joib — Thu, 05 Apr 2007 18:38:43 +0000

For large pages, there's libhugetlbfs, so you can use large pages via LD_PRELOAD without changing the application itself.

Why add anything?

farnz — Thu, 05 Apr 2007 14:15:13 +0000

Might be worth looking at Linux-mm.org on huge pages. In particular, there's a link to an LWM article on transparent use of huge pages. The "holy grail" is very definitely transparent use, so that whenever possible, all applications gain; anything that makes it easier to move that way is helpful.

One thought; if your mmap parameter is simply a hint that the block will be used in a particular granularity, it's easy to implement. Current mmap sets the parameter to 1 byte (no granularity needed), unless mmaping in hugetlbfs pages, when it sets the parameter to (e.g.) 16M. The kernel then just rounds up to the next highest available page size when possible, or down if not.

Why add anything?

giraffedata — Fri, 30 Mar 2007 23:19:16 +0000

I can see the value of using mmap() for this, but I don't think you want mmap() guessing based on the size of the request what page size is best.

It's quite possible that 16M of memory will consist of 100 scattered 4K pages of working set and the rest rarely used or even vacant. You wouldn't want to page the whole 16M in and out in that case.

Page granularity seems like a perfectly sensible parameter of an mmap, though.

Why add anything?

mjr — Fri, 30 Mar 2007 12:59:16 +0000

I'm wondering the same myself. I'm not much for low level hacking, but I fail to see what benefits one would reap from yet another interface.

If a separate interface is really necessary for some reason, I'd put the same functionality behind regular libc malloc(); it already does brk() for small allocations and mmap() for large ones I believe, so it could just as well do extra-large allocations via the hugetlb API. (Putting this in malloc instead of mmap would get rid of the partial-munmap issue on the libc end.)

Why add anything?

ncm — Fri, 30 Mar 2007 06:16:30 +0000

Why should this need a file system, or a device, or a library at all?

It should suffice to call mmap() and ask for an anonymous chunk of 16M, and the kernel can simply recognize that a hugetlb would serve, and use it. If, later, the process unmaps pages within it, the remaining pages can be switched over to the regular mapping scheme; most processes won't. Then it would be easy, safe, and backward-compatible for libc to switch malloc over to allocating hugetlb chunks by default, benefitting everybody.

I would also like to see a flag added to mmap() to require that the mapped block be aligned to match its size; e.g. ask for 16M and the bottom 24 bits of the returned address are 0. (Anybody else remember when 68K chips shipped with only 24 address pins, and Apple stuck annotations in the top 8 bits of addresses because the hardware ignored those bits?)

Really?

IkeTo — Fri, 30 Mar 2007 05:40:33 +0000

> I mean, you design a hard-to-use interface, then write your own code which
> presents a friendly interface to userspace -- and you write it in
> userspace. Well, why not present a friendly interface in the kernel in the
> first place?

Perhaps the whole hugetlb thing tells one of the possible reasons. The original /dev/hshm interface is actually more general than the /dev/hugetlb interface: it allows multiple processes unrelated in ancestry to share the same piece of huge page. It is probably preferable for the kernel API to use only the general interface rather than having to implement both, since every time the interface change it needs to have a "global search" for libraries/applications using the interface, and leave enough time for those libraries/applications to change (if Linus does not say "no" to the change right away). So it might be preferable to implement just the general interface, hoping that it will never change at all; and have another library "cast" it to various different forms that are "more friendly" forms like the hugetlb interface. What unclear to me is actually why one would expect that the new library could be exempted from the global search if it needs to be changed.

I think instead of a general liblinux, we should be contented with the tested solutions of, e.g., pthread (futex) and libfam (dnotify): if the functionality fits well into a general audience, the easier interface is implemented in libc, and if it is not, the easier interface is implemented in a functionality specific library. That way, when the generic interface is changed, the kernel developers have fewer places to search for direct users of them; and the specific interface is usable (and thus relied upon) by a more narrow set of end-user applications.

Really?

cpeterso — Fri, 30 Mar 2007 01:16:20 +0000

Is it just because kernel->userspace interfaces are set in stone and have to be maintained forever? For that would feel a bit like medieval astronomers -- weaving layer over layer of epicycles so that their spheres would match the real planet trajectories. Here we would have a kernel interface set in stone, then some library code -- which once people use it would again be set in stone, only to add a new glue layer... again and again. Waiting a few iterations might be a better course of action, and I gather from LWN that it is often taken by kernel devs.

I think the kernel API can change, so user programs should use the "friendly" userspace library APIs.

Really?

man_ls — Thu, 29 Mar 2007 22:21:09 +0000

Do you really think it is a great idea? Pardon for my lack of knowledge about kernel development, but why is it so great? I mean, you design a hard-to-use interface, then write your own code which presents a friendly interface to userspace -- and you write it in userspace. Well, why not present a friendly interface in the kernel in the first place?

Is it just because kernel->userspace interfaces are set in stone and have to be maintained forever? For that would feel a bit like medieval astronomers -- weaving layer over layer of epicycles so that their spheres would match the real planet trajectories. Here we would have a kernel interface set in stone, then some library code -- which once people use it would again be set in stone, only to add a new glue layer... again and again. Waiting a few iterations might be a better course of action, and I gather from LWN that it is often taken by kernel devs.

If the purpose of this scheme is to have a more powerful interface, I much prefer our editor's suggestion:

A separate library for developers trying to do obscure and advanced things with the kernel might be the right solution.

I have seen too many complex interfaces that nobody uses because they are so complex, and everyone uses the simplified version. Better start simple, and then add complexity as needed.

Application-friendly kernel interfaces

vmole — Thu, 29 Mar 2007 16:02:23 +0000

Strike 2! [Steve readies for the next pitch.]

Yeah yeah, I know that code written for forums like this is at best psuedo-code. Hell, I blew it just the other day, so I'm hardly the one to be picking on you, but I was amused by the "Show me the code" - "huh?" sequence.

Perhaps we can get away with claiming "Well, it was a actually a debugging test for the reader". Right, that's it.

Application-friendly kernel interfaces

ebiederm — Thu, 29 Mar 2007 15:38:45 +0000

Yea yea.

snprintf(buffer, sizoef(buffer), ....);

Application-friendly kernel interfaces

vmole — Thu, 29 Mar 2007 15:16:24 +0000

snprintf(buffer, "%s/XXXXXX", PATH_TO_HUGETLBFS);

So much for working code... ;-)

liblinux

jospoortvliet — Thu, 29 Mar 2007 11:54:52 +0000

Indeed, it really sounds like a great idea. This way, systems like GTK/glibc and Qt/kdelibs could link to this library or even only use it when available to speed some things up, while using workarounds on other OS'es like the BSD's, solaris, mac OS X etc.

liblinux

hummassa — Thu, 29 Mar 2007 11:30:32 +0000

This would also permit the kernel devs to further experiment in yanking
functionality out of the kernel... things that _could_ be done in
userspace without performance penalties _should_ be done in userspace :-)
linux + liblinux would be maintained from the same source -- so they would
always be in sync -- and this would be really great.

Application-friendly kernel interfaces

ms — Thu, 29 Mar 2007 08:21:13 +0000

I think this is a great idea. This allows for greater decoupling between glibc and the Linux kernel and is, IMHO, the proper abstraction. Plus, if the authors of the kernel interfaces are subsequently charged with writing liblinux entries then there could well be cases where the authors rather return to the drawing board and rethink the kernel interface if it's just too damn hard to use from userspace.

Application-friendly kernel interfaces

ebiederm — Thu, 29 Mar 2007 06:02:46 +0000

Huh?

#define PATH_TO_HUGETLBFS "/dev/hshm"

void *map_anon_hugetlb(size_t size)
{
char buffer[PATH_MAX];
int fd;
snprintf(buffer, "%s/XXXXXX", PATH_TO_HUGETLBFS);
fd = mkstemp(buffer);
if (fd < 0)
return MAP_FAILED;
unlink(buffer);
ftruncate(fd, size);
return mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
}

Application-friendly kernel interfaces

orospakr — Thu, 29 Mar 2007 04:32:09 +0000

liblinux, eh?

now that's an interesting idea.

Application-friendly kernel interfaces

jreiser — Thu, 29 Mar 2007 03:07:26 +0000

It's not possible to do normal reads and writes from this filesystem [hugetlbfs] ...

and that makes hugetlbfs less than a filesystem. Hugetlbfs is a hack, and it is hard to use. Hugetlbfs is so hard to use that our editor could not find an actual working example to cite. Show me the code!