In brief

By Jonathan Corbet
August 19, 2009

Low-memory mapping. The recent set of null-pointer vulnerabilities has not been helped by the confusion around how the mmap_min_addr knob and security modules interact. The 2.6.31-rc7 kernel will see a couple of changes intended to clarify and rationalize this interaction. With these patches, SELinux will no longer bypass the mmap_min_addr check; any process wanting to map memory below that address will require the CAP_SYS_RAWIO capability. However, SELinux will also implement its own low-memory limit, controlled by a separate knob in the SELinux policy. As a result, it will be possible to turn off the mmap_min_addr protection but still only allow specific programs to do low-memory mapping.

Module loading security. Another SELinux-related change - not yet merged into the mainline - adds a new hook to request_module(). The idea here is to try to limit the ability of user-space programs to load arbitrary modules into a running kernel. In future versions of the SELinux policy, the ability to trigger module loads is likely to be reduced to a much smaller set of roles.

HugeTLB mappings. The "hugetlb" feature allows processes to create pseudo-anonymous memory mappings backed by pages which are larger (perhaps much larger) than the normal system page size. For certain kinds of applications, these mappings can improve performance by reducing pressure on the CPU's translation lookaside buffer (TLB). The kernel code resides within such a mapping for the same reason. Using hugetlb pages in user space is a bit awkward, though; it requires mounting the special hugetlbfs filesystem and mapping files from there.

Eric Munson has put together a patch implementing an easier way. With this patch, the mmap() system call understands a new MAP_HUGETLB flag; when that flag is present, the kernel will attempt to create a mapping backed by huge pages. Underneath it all, the mapping is still implemented as a hugetlbfs file, but user space need no longer be aware of that fact.

spin_is_locked(). Kernel code can test the current state of a spinlock with spin_is_locked(). But what should this function return on a single-processor system, where spinlocks do not exist at all? Kumar Gala ran into trouble because one uniprocessor spin_is_locked() implementation returned zero. The problem was code in this form:

    /* Ensure we have the requisite lock */
    BUG_ON(!spin_is_locked(&some_lock));

So Kumar thinks that the return value should always be true. But there are other situations where that is just the wrong thing to do; Linus gave an example where code is waiting for a lock to become free.

The real problem is that a predicate like spin_is_locked() simply lacks a well-defined meaning when the spinlock does not exist. So there is no way to always give the "right" answer in such situations. What may happen instead is that, in a future kernel, spin_is_locked() will be deprecated. Instead, there will be new expect_spin_locked() and expect_spin_unlocked() primitives for testing the state of a spinlock. When the code is this explicit about what it is looking for, the default answer can make sense; both would return true on uniprocessor systems.

localmodconfig. Many kernel testers want to build a kernel which looks like the kernel shipped with their distribution. But distributor kernels come with a configuration which builds almost everything. So our poor tester ends up waiting for a very long time as the system builds a bunch of modules which will never be used. One could avoid this problem by creating a new configuration from scratch, but that process can be a little daunting as well. There are a lot of configuration options in a contemporary kernel.

Steven Rostedt has posted a build system change intended to help with this problem. A user who types:

    make localmodconfig

will get a configuration which builds all modules currently loaded into the running system, but no others. That should be a configuration which supports the system's hardware, but which lacks hundreds of useless modules. There is also a localyesconfig option which builds the required drivers directly into the kernel.

Low-memory mapping - core fixes

Posted Aug 20, 2009 1:44 UTC (Thu) by jamesmrh (guest, #31622) [Link]

FYI, these changes are currently being merged into F11 and F10 (rawhide will pick them up automatically), and new kernels should be out v. soon.

Addressing this at the design level produces the most flexible result:

- the sysctl (mmap_min_addr) cannot be overridden at all by MAC security policy (e.g. SELinux)

- it can only be overridden with CAP_SYS_RAWIO

- if the sysctl is disabled to allow e.g. wine to run, MAC security policy can be used to add further restrictions to ensure that only wine can perform the mapping, and nothing else. i.e. running wine does not mean degrading security for the entire system.

Yay for localyesconfig

Posted Aug 20, 2009 9:25 UTC (Thu) by alex (subscriber, #1355) [Link]

Props to Steven Rostedt for implementing this. I'd been mulling over the usefulness of this option ever since I built a kernel for my Atom powered netbook which as you can expect takes a long long time. I suspect it will still take a while but at least a while compiling useful stuff :-)

Helpful localmodconfig

Posted Aug 20, 2009 16:59 UTC (Thu) by sjayaraman (guest, #48013) [Link]

Kudos!

I think Steven Rostedt posted earlier a perl script to do this called streamline-config.pl which was useful. This is even better - making it part of the build system!

In brief

Posted Aug 21, 2009 2:05 UTC (Fri) by eparis123 (guest, #59739) [Link]

I really like the idea of the new option. Saves a lot of time wandering through the huge list of kernel options

localmodconfig

Posted Aug 22, 2009 4:09 UTC (Sat) by dirtyepic (guest, #30178) [Link] (7 responses)

i'm guessing this wouldn't catch modules that are dynamically loaded when needed, like, off the top of my head, microcode?

localmodconfig

Posted Aug 23, 2009 17:07 UTC (Sun) by nevets (subscriber, #11875) [Link] (6 responses)

It catches all modules that have been loaded. Anything listed in 'lsmod'.

localmodconfig

Posted Aug 23, 2009 21:57 UTC (Sun) by dirtyepic (guest, #30178) [Link] (5 responses)

and microcode built as a module is loaded once during boot to update the microcode, and then immediately unloaded, so the answer would be "no". I'm guessing things like cpufreq modules that aren't currently in use would also be missed.

localmodconfig

Posted Aug 23, 2009 22:17 UTC (Sun) by mjg59 (subscriber, #23239) [Link] (4 responses)

We don't generally automatically unload unused modules these days, especially since there's no good way to tell the difference between an unused module and one that's in use but has a reference count of zero. Anything that's loaded at boot is probably still loaded.

localmodconfig

Posted Aug 24, 2009 2:19 UTC (Mon) by ABCD (subscriber, #53650) [Link] (3 responses)

> Anything that's loaded at boot is probably still loaded.

That usually isn't true in the special case of the microcode module, as the various distribution's init scripts generally load the module, update the microcode, then immediately explicitly unload the module.

localmodconfig

Posted Aug 24, 2009 14:21 UTC (Mon) by nevets (subscriber, #11875) [Link] (2 responses)

If there's a way to know about these, then certainly let me know. Email me at rostedt@goodmis.org. The localmodconfig is a start. But if we are missing necessary modules to boot the kernel, then we need to find a way to fix that.

localmodconfig

Posted Aug 25, 2009 1:35 UTC (Tue) by dirtyepic (guest, #30178) [Link]

that's the only one i can think of. i was mistaken about the cpufreq stuff of course. :) nice job, btw, this is something that was sorely needed.

localmodconfig

Posted Sep 22, 2009 19:49 UTC (Tue) by BobRobertson (guest, #2048) [Link]

I found out about this after posting a "wish list" to lxer.com which
included building a kernel .config that had only the actually used modules
compiled in, and everything else as loadable modules.

I'll add my own Thank You to the list.

In brief

Posted Aug 22, 2009 10:28 UTC (Sat) by dlang (guest, #313) [Link] (7 responses)

do the local*module commands have some way of pointing you at a file from another server instead of (I assume) /proc/modules?

there needs to be some way to gather the info from one (probably low-powered) system and taking it to another system to actually do the build on.

In brief

Posted Aug 22, 2009 17:26 UTC (Sat) by rvfh (guest, #31018) [Link] (6 responses)

I suppose you can do local*config on the low-power machine and interrupt it when it starts compiling, then copy the .config file accross. Make sure you have the right compiler and compiler options though...

In brief

Posted Aug 23, 2009 3:21 UTC (Sun) by jzbiciak (guest, #5246) [Link] (5 responses)

It looks like all this make target does is generate the .config. You'd still have to build the kernel and modules afterwards. (At least, that's the impression I got from the linked email.)

In brief

Posted Aug 23, 2009 17:11 UTC (Sun) by nevets (subscriber, #11875) [Link] (4 responses)

Yes exactly. It is just like the other "make *config". It only creates a config file and nothing more. I've used it on embedded boards to get a proper config. I would NFS mount the source, log into the embedded device, cd to the NFS mounted directory and run the "make localmodconfig" (well this is actually a lie, since I really just manually ran streamline_config.pl, but this should also work), and then built the resulting config on a faster box.

In brief

Posted Aug 25, 2009 21:01 UTC (Tue) by mb (subscriber, #50428) [Link] (3 responses)

Well, but many small and embedded devices don't even have make or perl. So isn't there some way to tell the script to look somewhere else than /proc/modules (or wherever it looks into)?
So I could simply scp the modules list from the embedded machine to the build host and do the rest on the big machine.

In brief

Posted Aug 25, 2009 23:01 UTC (Tue) by nevets (subscriber, #11875) [Link] (2 responses)

Yeah, that looks like we can add an enhancement. Perhaps add a "make LOADED_MODULES=embedded.lsmod localmodconfig", where embedded.lsmod is a filename holding the modules of the embedded device. Or have it use an environment variable to find the list of modules. If the environment variable does not exist, then it uses lsmod or /proc/modules. It currently does lsmod, but perhaps it should use /proc/modules directly.

Thanks.

In brief

Posted Aug 25, 2009 23:09 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

if this just executes lsmod and parses the result, then the user can create a lsmod earlier in their path that echo's the results from a different box

or you could allow the user to specify a command to get the data (defaulting to lsmod if nothing is specified)

In brief

Posted Sep 3, 2009 20:35 UTC (Thu) by kabloom (guest, #59417) [Link]

Both of those are a bit of overkill. Why create an executable to generate the data, when a file will do.

local*config in Linux Next

Posted Aug 25, 2009 0:39 UTC (Tue) by nevets (subscriber, #11875) [Link]

local{mod,yes}config is now in the master branch of Linux Next:

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Play with it and report anything that breaks.

Note, if you are using git, then you can do the following as well:

git remote add next git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git remote update
git checkout next/master
make localmodconfig
git checkout <previous branch>

Now you have a streamlined config, even if you do not have the option in your current development branch.