By Michael Kerrisk
October 10, 2012
There are many mechanisms for communicating information between
user-space applications and the kernel. System calls and pseudo-filesystems
such as /proc and /sys are of course the most well known.
Signals are similarly well known; the kernel employs signals to
inform a process of various synchronous or asynchronous events—for
example, when the process tries to write to a broken pipe or a child of the
process terminates.
There are also a number of more obscure mechanisms for communication
between the kernel and user space. These include the Linux-specific netlink sockets and user-mode
helper features. Netlink sockets provide a socket-style API for
exchanging information with the kernel. The user-mode helper feature allows
the kernel to automatically invoke user-space executables; this mechanism
is used in a number of places, including the implementation of control
groups and piping core dumps
to a user-space application.
The auxiliary vector, a mechanism for communicating information from
the kernel to user space, has remained largely invisible until
now. However, with the addition of a new library function,
getauxval(), in the GNU C library (glibc) 2.16 release that
appeared at the end of June, it has now become more visible.
Historically, many UNIX systems have implemented the auxiliary vector
feature. In essence, it is a list of key-value pairs that the kernel's ELF
binary loader (fs/binfmt_elf.c in the kernel source) constructs
when a new executable image is loaded into a process. This list is placed
at a specific location in the process's address space; on Linux systems it
sits at the high end of the user address space, just above the (downwardly
growing) stack, the command-line arguments (argv), and environment
variables (environ).
From the description and diagram, we can see that although the
auxiliary vector is somewhat hidden, it is accessible with a little
effort. Even without using the new library function, an application that
wants to access the auxiliary vector merely needs to obtain the address of
the location that follows the NULL pointer at the end of the
environment list. Furthermore, at the shell level, we can discover the
auxiliary vector that was supplied to an executable by setting the
LD_SHOW_AUXV environment variable when launching an application:
$ LD_SHOW_AUXV=1 sleep 1000
AT_SYSINFO_EHDR: 0x7fff35d0d000
AT_HWCAP: bfebfbff
AT_PAGESZ: 4096
AT_CLKTCK: 100
AT_PHDR: 0x400040
AT_PHENT: 56
AT_PHNUM: 9
AT_BASE: 0x0
AT_FLAGS: 0x0
AT_ENTRY: 0x40164c
AT_UID: 1000
AT_EUID: 1000
AT_GID: 1000
AT_EGID: 1000
AT_SECURE: 0
AT_RANDOM: 0x7fff35c2a209
AT_EXECFN: /usr/bin/sleep
AT_PLATFORM: x86_64
The auxiliary vector of each process on the system is also visible via
a corresponding /proc/PID/auxv file. Dumping the contents of the
file that corresponds to the above command (as eight-byte decimal numbers,
because the keys and values are of that size on the 64-bit system used for
this example), we can see the key-value pairs in the vector, followed by a
pair of zero values that indicate the end of the vector:
$ od -t d8 /proc/15558/auxv
0000000 33 140734096265216
0000020 16 3219913727
0000040 6 4096
0000060 17 100
0000100 3 4194368
0000120 4 56
0000140 5 9
0000160 7 0
0000200 8 0
0000220 9 4200012
0000240 11 1000
0000260 12 1000
0000300 13 1000
0000320 14 1000
0000340 23 0
0000360 25 140734095335945
0000400 31 140734095347689
0000420 15 140734095335961
0000440 0 0
0000460
Scanning the high end of user-space memory or /proc/PID/auxv
is a clumsy way of retrieving values from the auxiliary vector. The new
library function provides a simpler mechanism for retrieving individual
values from the list:
#include <sys/auxv.h>
unsigned long int getauxval(unsigned long int type);
The function takes a key as its single argument, and returns the
corresponding value. The glibc header files define a set of symbolic
constants with names of the form AT_* for the key value passed to
getauxval(); these names are exactly the same as the strings
displayed when executing a command with LD_SHOW_AUXV=1.
Of course, the obvious question by now is: what sort of information is
placed in the auxiliary vector, and who needs that information? The
primary customer of the auxiliary vector is the dynamic linker
(ld-linux.so). In the usual scheme of things, the kernel's ELF
binary loader constructs a process image by loading an executable into the
process's memory, and likewise loading the dynamic linker into memory. At
this point, the dynamic linker is ready to take over the task of loading
any shared libraries that the program may need in preparation for handing
control to the program itself. However, it lacks some pieces of
information that are essential for these tasks: the location of the program
inside the virtual address space, and the starting address at which
execution of the program should commence.
In theory, the kernel could provide a system call that the dynamic
linker could use in order to obtain the required information. However, this
would be an inefficient way of doing things: the kernel's program loader already has the information (because it has scanned the
ELF binary and built the process image) and
knows that the dynamic linker will need it. Rather than maintaining a
record of this information until the dynamic linker requests it, the kernel
can simply make it available in the process image at some location known to
the dynamic linker. That location is, of course, the auxiliary vector.
It turns out that there's a range of other information that the
kernel's program loader already has and which it knows the dynamic linker
will need. By placing all of this information in the auxiliary vector, the
kernel either saves the programming overhead of making this information
available in some other way (e.g., by implementing a dedicated system
call), or saves the dynamic linker the cost of making a system call, or
both. Among the values placed in the auxiliary vector and available via
getauxval() are the following:
- AT_PHDR and AT_ENTRY: The values for these keys
are the address of the ELF program headers of the executable and the entry
address of the executable. The dynamic linker uses this information to
perform linking and pass control to the executable.
- AT_SECURE: The kernel assigns a nonzero value to this key
if this executable should be treated securely. This setting may be
triggered by a Linux Security Module, but the common reason is that the
kernel recognizes that the process is executing a set-user-ID or
set-group-ID program. In this case, the dynamic linker disables the use of
certain environment variables (as described in the ld-linux.so(8)
manual page) and the C library changes other aspects of its behavior.
- AT_UID, AT_EUID, AT_GID, and
AT_EGID: These are the real and effective user and group IDs of
the process. Making these values available in the vector saves the dynamic
linker the cost of making system calls to determine the values. If the
AT_SECURE value is not available, the dynamic linker uses these
values to make a decision about whether to handle the executable securely.
- AT_PAGESZ: The value is the system page size. The
dynamic linker needs this information during the linking phase, and the C
library uses it in the implementation of the malloc family of
functions.
- AT_PLATFORM: The value is a pointer to a string
identifying the hardware platform on which the program is running. In some
circumstances, the dynamic linker uses this value in the interpretation of
rpath values. (The ld-linux.so(8) man page describes
rpath values.)
- AT_SYSINFO_EHDR: The value is a pointer to the page
containing the Virtual Dynamic Shared Object (VDSO) that the kernel creates
in order to provide fast implementations of certain system calls. (Some
documentation on the VDSO can be found in the kernel source file
Documentation/ABI/stable/vdso.)
- AT_HWCAP: The value is a pointer to a multibyte mask of
bits whose settings indicate detailed processor capabilities. This
information can be used to provide optimized behavior for certain library
functions. The contents of the bit mask are hardware dependent (for
example, see the kernel source file
arch/x86/include/asm/cpufeature.h for details relating to the
Intel x86 architecture).
- AT_RANDOM: The value is a pointer to sixteen random bytes
provided by the kernel. The dynamic linker uses this to implement a stack
canary.
The precise reasons why the GNU C library developers have chosen to add
the getauxval() function now are a little unclear. The commit
message and NEWS file entry for the change were merely brief explanations
of what the change was, rather than why it was made. The only clue provided by the implementer on the
libc-alpha mailing list suggested that doing so was useful to allow for
"future enhancements to the AT_ values, especially target-specific
ones." That comment, plus the observation that the glibc developers
tend to be rather conservative about adding new interfaces to the ABI,
suggest that that they have some interesting new user-space uses of the
auxiliary vector in mind.
Comments (8 posted)
Brief items
I recommend NOT assuming that package managers are the cat's pajamas
and that therefore we can all skip the ability to usefully build from
source.
—
John Gilmore
Comments (none posted)
The
KDE Manifesto has been
released. "
The KDE Manifesto is not intended to change the organization or the way it works. Its aim is only to describe how the KDE Community sees itself. What binds us together are certain values and their practical implications, without regard for who a person is or what background and skills they bring. It is a living document, so it will change over time as KDE continues to grow and mature. We are sharing the Manifesto to help people understand what KDE is all about, what we want to accomplish and why we do what we do."
Comments (8 posted)
Mozilla has released
Firefox 16. See the
details in the
release
notes. Firefox 16.0 is also available for Android. Here are the
Android
version release notes.
Comments (2 posted)
The Electronic Frontier Foundation (EFF) has
released
version 3.0 of HTTPS Everywhere. HTTPS Everywhere 3.0 adds encryption
protection to 1,500 more websites, twice as many as previous stable
releases. "
Our current estimate is that HTTPS Everywhere 3 should encrypt at least a hundred billion page views in the next year, and trillions of individual HTTP requests."
Comments (none posted)
Version 2.0 of the system diagnostic framework SystemTap has been released. This release adds a simple macro facility to the built-in scripting language, the ability to conditionally vary code based on the user's privilege level, and an experimental backend that allows SystemTap to profile a user's own processes (i.e., without root privileges).
Full Story (comments: none)
Antoine Martin wrote in to alert us to the latest release of xpra, the "screen for X" utility. This release includes a host of new features, including several video compression formats and experimental support for multiple, concurrent clients.
Full Story (comments: 2)
At its open.NASA blog, the US space agency is soliciting input from the public on the data sets and APIs it provides. "As we collect more and more data, figuring out the best way to distribute, use, and reuse the data becomes more and more difficult. API’s are one way we can significantly lower the barrier of entry to people from outside NASA being able to manipulate and access our public information." The current estimate is that NASA collects 15 terabytes of data per day, and future missions may collect far more.
Comments (none posted)
Newsletters and articles
Comments (1 posted)
Over at opensource.com, Ruth Suehle
reports on the Open Hardware Summit, which was recently held in New York. At the summit, the
Open Source Hardware Association was officially launched and various ideas about open hardware business strategies were discussed. "
Many in the audience were waiting for the afternoon session that included Bre Pettis, co-founder and CEO of MakerBot, creators of a popular open source 3D printer. Earlier in the week, the company announced its latest product, the Replicator 2 3D printer. At the same time, Pettis announced to much controversy, 'For the Replicator 2, we will not share the way the physical machine is designed or our GUI because we don’t think carbon-copy cloning is acceptable and carbon-copy clones undermine our ability to pay people to do development.'"
Comments (7 posted)
The H
creates
a standalone mobile telephone network using the sysmoBTS base station. "
In previous articles, we've looked at the question of how free are the phones that people use every day, and looked at the theory behind building your own GSM phone network using open source software. Now, in this article we take a look at the sysmoBTS, a small form-factor GSM Base Transceiver Station (BTS) built around these principles and the steps required to configure it to provide a standalone mobile telephone network that is useful for research, development and testing purposes."
Comments (2 posted)
For anybody wanting to work with Openstreetmap data using PostgreSQL, here's
a collection of useful tools and techniques. "
At first glance, OSM data and Postgres (specifically PostGIS) seem like a natural, easy fit for one another: OSM is vector data, PostGIS stores vector data. OSM has usernames and dates-modified, PostGIS has columns for storing those things in tables. OSM is a worldwide dataset, PostGIS has fast spatial indexes to get to the part you want. When you get to OSM’s free-form tags, though, the row/column model of Postgres stops making sense and you start to reach for linking tables or advanced features like hstore..
Comments (none posted)
At the Public Knowledge blog, Michael Weinberg addresses the differing legal underpinnings of open source hardware and open source software. "This combination – copyright that does not protect function, trademark that needs to be applied for and does not protect function, and patents that need to be applied for and can protect functions – means that most hardware projects are 'open' by default because their core functionality is not protected by any sort of intellectual property right. Of course, in this case 'open' means that their key functionality can be copied without legal repercussion, not that the schematics have been posted online or that it is easy to discover how they work (critical elements of open source hardware)." The article is an extension of Weinberg's recent talk at the Open Hardware Summit, and poses questions interesting in light of MakerBot's announcement that its latest 3D printer would not be open.
Comments (none posted)
Page editor: Nathan Willis
Next page: Announcements>>